Genomic intraspecies variation

Taking advantage of the availability of hundreds of completely sequenced genomes both from humans and flies, we did the first comparison of the intraspecies variation of genes with their interspecies variation [1]. We find that in the set of genes highly conserved between flies and humans, there are outliers that are highly variable within the datasets studied. These genes are enriched in ribosomal proteins, which are different ones for humans and flies. Hijacking of the ribosomal machinery by viruses is needed for their infectivity. Therefore, we interprete this result as a signal of evolutionary pressure in populations to increase the variability of proteins (likely in loops not affecting functionality) to avoid recognition by viruses. Variability of these proteins within a species would make difficult for the virus to infect many members of the population.

The Genome Aggregation Database (gnomAD) provides thousands of variants of the human genome occurring naturally. We used these data to reassess our differences with our closest species, the chimpanzee, at the level of protein coding genes by focusing on a set of human-specific variants (6,210 in 4,475 proteins) [2]. These are frequent in disordered and low complexity regions. We pointed to 1,310 of this variants (in 1,095 proteins), which are conserved in a set of non-11 non-human primates and will be more likely to be associated to human-specific features. Again, we observed an enrichment in terms related to protein terms related to non-globular structure but no functional terms.

References

[1] Shih, J., R. Hodge and M.A. Andrade-Navarro. 2014. Comparison of inter and intraspecies variation in humans and fruit flies. Genomics Data. 3, 49-54.

[2] Mier, P., M.A. Andrade-Navarro and E. Morett. 2025. Apparent differences between human and chimp proteomes are reduced when considering human population: Human specific variants are enriched in disordered and compositionally biased regions. PLOS ONE. In press.