Tracking human population structure through time from whole genome sequences
Autoři:
Ke Wang aff001; Iain Mathieson aff002; Jared O’Connell aff003; Stephan Schiffels aff001
Působiště autorů:
Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
aff001; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
aff002; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United State of America
aff002; 23andMe Inc., Mountain View, California, United States of America
aff003; 23andMe Inc., Mountain View, California, United State of America
aff003
Vyšlo v časopise:
Tracking human population structure through time from whole genome sequences. PLoS Genet 16(3): e32767. doi:10.1371/journal.pgen.1008552
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pgen.1008552
Souhrn
The genetic diversity of humans, like many species, has been shaped by a complex pattern of population separations followed by isolation and subsequent admixture. This pattern, reaching at least as far back as the appearance of our species in the paleontological record, has left its traces in our genomes. Reconstructing a population’s history from these traces is a challenging problem. Here we present a novel approach based on the Multiple Sequentially Markovian Coalescent (MSMC) to analyze the separation history between populations. Our approach, called MSMC-IM, uses an improved implementation of the MSMC (MSMC2) to estimate coalescence rates within and across pairs of populations, and then fits a continuous Isolation-Migration model to these rates to obtain a time-dependent estimate of gene flow. We show, using simulations, that our method can identify complex demographic scenarios involving post-split admixture or archaic introgression. We apply MSMC-IM to whole genome sequences from 15 worldwide populations, tracking the process of human genetic diversification. We detect traces of extremely deep ancestry between some African populations, with around 1% of ancestry dating to divergences older than a million years ago.
Klíčová slova:
DNA recombination – Gene flow – Genomic libraries – Haplotypes – Human genomics – Introgression – Population size – Simulation and modeling
Zdroje
1. McVean GAT, Cardin NJ. Approximating the coalescent with recombination. Philos Trans R Soc Lond B Biol Sci. 2005;360: 1387–1393. doi: 10.1098/rstb.2005.1673 16048782
2. Marjoram P, Wall JD. Fast “coalescent” simulation. BMC Genet. 2006;7: 16. doi: 10.1186/1471-2156-7-16 16539698
3. Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475: 493–496. doi: 10.1038/nature10231 21753753
4. Schiffels S, Durbin R. Inferring human population size and separation history from multiple genome sequences. Nat Genet. 2014;46: 919–925. doi: 10.1038/ng.3015 24952747
5. Steinrücken M, Kamm JA, Song YS. Inference of complex population histories using whole-genome sequences from multiple populations. Cold Spring Harbor Labs Journals; 2015 Sep. Available: http://biorxiv.org/lookup/doi/10.1101/026591
6. Sheehan S, Harris K, Song YS. Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach. 2013;194: 647–662. doi: 10.1534/genetics.112.149096 23608192
7. Terhorst J, Kamm JA, Song YS. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet. 2017;49: 303–309. doi: 10.1038/ng.3748 28024154
8. Kamm JA, Terhorst J, Song YS. Efficient computation of the joint sample frequency spectra for multiple populations. J Comput Graph Stat. 2017;26: 182–194. doi: 10.1080/10618600.2016.1159212 28239248
9. Kamm J, Terhorst J, Durbin R, Song YS. Efficiently Inferring the Demographic History of Many Populations With Allele Count Data. J Am Stat Assoc. 2019; 1–16. doi: 10.1080/01621459.2019.1635482
10. Excoffier L, Foll M. fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics. 2011;27: 1332–1334. doi: 10.1093/bioinformatics/btr124 21398675
11. Schiffels S, Haak W, Paajanen P, Llamas B, Popescu E, Loe L, et al. Iron Age and Anglo-Saxon genomes from East England reveal British migration history. Nat Commun. 2016;7: 10408. doi: 10.1038/ncomms10408 26783965
12. Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature. 2016;538: 201–206. doi: 10.1038/nature18964 27654912
13. Malaspinas A-S, Westaway MC, Muller C, Sousa VC, Lao O, Alves I, et al. A genomic history of Aboriginal Australia. Nature. 2016;538: 207–214. doi: 10.1038/nature18299 27654914
14. Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. 2014;505: 43–49. doi: 10.1038/nature12886 24352235
15. Plagnol V, Wall JD. Possible ancestral structure in human populations. PLoS Genet. 2006;2: e105. doi: 10.1371/journal.pgen.0020105 16895447
16. Durvasula A, Sankararaman S. Recovering signals of ghost archaic admixture in the genomes of present-day Africans. bioRxiv. 2018. p. 285734. doi: 10.1101/285734
17. Skoglund P, Thompson JC, Prendergast ME, Mittnik A, Sirak K, Hajdinjak M, et al. Reconstructing Prehistoric African Population Structure. Cell. 2017;171: 59–71.e21. doi: 10.1016/j.cell.2017.08.049 28938123
18. Sankararaman S, Mallick S, Dannemann M, Prüfer K, Kelso J, Pääbo S, et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature. 2014;507: 354–357. doi: 10.1038/nature12961 24476815
19. Sankararaman S, Mallick S, Patterson N, Reich D. The Combined Landscape of Denisovan and Neanderthal Ancestry in Present-Day Humans. Curr Biol. 2016;26: 1241–1247. doi: 10.1016/j.cub.2016.03.037 27032491
20. Browning SR, Browning BL, Zhou Y, Tucci S, Akey JM. Analysis of Human Sequence Data Reveals Two Pulses of Archaic Denisovan Admixture. Cell. 2018;173: 53–61.e9. doi: 10.1016/j.cell.2018.02.031 29551270
21. Pagani L, Lawson DJ, Jagoda E, Mörseburg A, Eriksson A, Mitt M, et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature. 2016;538: 238–242. doi: 10.1038/nature19792 27654910
22. Meyer M, Kircher M, Gansauge M-T, Li H, Racimo F, Mallick S, et al. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338: 222–226. doi: 10.1126/science.1224344 22936568
23. Raghavan M, Skoglund P, Graf KE, Metspalu M, Albrechtsen A, Moltke I, et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature. 2014;505: 87–91. doi: 10.1038/nature12736 24256729
24. Delaneau O, Zagury J-F, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods. 2013;10: 5–6. doi: 10.1038/nmeth.2307 23269371
25. Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet. 2007;81: 1084–1097. doi: 10.1086/521987 17924348
26. Loh P-R, Danecek P, Palamara PF, Fuchsberger C, A Reshef Y, K Finucane H, et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet. 2016;48: 1443–1448. doi: 10.1038/ng.3679 27694958
27. Delaneau O, Howie B, Cox AJ, Zagury J-F, Marchini J. Haplotype estimation using sequencing reads. Am J Hum Genet. 2013;93: 687–696. doi: 10.1016/j.ajhg.2013.09.002 24094745
28. Choi Y, Chan AP, Kirkness E, Telenti A, Schork NJ. Comparison of phasing strategies for whole human genomes. PLoS Genet. 2018;14: e1007308. doi: 10.1371/journal.pgen.1007308 29621242
29. Song S, Sliwerska E, Emery S, Kidd JM. Modeling Human Population Separation History Using Physically Phased Genomes. Genetics. 2017;205: 385–395. doi: 10.1534/genetics.116.192963 28049708
30. Pickrell JK, Patterson N, Barbieri C, Berthold F, Gerlach L, Güldemann T, et al. The genetic prehistory of southern Africa. Nat Commun. 2012;3: 1143. doi: 10.1038/ncomms2140 23072811
31. Tishkoff SA, Gonder MK, Henn BM, Mortensen H, Knight A, Gignoux C, et al. History of click-speaking populations of Africa inferred from mtDNA and Y chromosome genetic variation. Mol Biol Evol. 2007;24: 2180–2195. doi: 10.1093/molbev/msm155 17656633
32. Knight A, Underhill PA, Mortensen HM, Zhivotovsky LA, Lin AA, Henn BM, et al. African Y chromosome and mtDNA divergence provides insight into the history of click languages. Curr Biol. 2003;13: 464–473. doi: 10.1016/s0960-9822(03)00130-1 12646128
33. Schlebusch CM, Skoglund P, Sjödin P, Gattepaille LM, Hernandez D, Jay F, et al. Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science. 2012;338: 374–379. doi: 10.1126/science.1227721 22997136
34. Schlebusch CM, Jakobsson M. Tales of Human Migration, Admixture, and Selection in Africa. Annu Rev Genomics Hum Genet. 2018. doi: 10.1146/annurev-genom-083117-021759 29727585
35. McDougall I, Brown FH, Fleagle JG. Stratigraphic placement and age of modern humans from Kibish, Ethiopia. Nature. 2005;433: 733–736. doi: 10.1038/nature03258 15716951
36. White TD, Asfaw B, DeGusta D, Gilbert H, Richards GD, Suwa G, et al. Pleistocene Homo sapiens from Middle Awash, Ethiopia. Nature. 2003;423: 742–747. doi: 10.1038/nature01669 12802332
37. Richter D, Grün R, Joannes-Boyau R, Steele TE, Amani F, Rué M, et al. The age of the hominin fossils from Jebel Irhoud, Morocco, and the origins of the Middle Stone Age. Nature. 2017;546: 293–296. doi: 10.1038/nature22335 28593967
38. Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al. Ancient admixture in human history. Genetics. 2012;192: 1065–1093. doi: 10.1534/genetics.112.145037 22960212
39. Hobolth A, Andersen LN, Mailund T. On computing the coalescence time density in an isolation-with migration model with few samples. Genetics. 2011. pp. 1241–1243. doi: 10.1534/genetics.110.124164 21321131
40. Kelleher J, Etheridge AM, McVean G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Comput Biol. 2016;12: e1004842. doi: 10.1371/journal.pcbi.1004842 27145223
Článek vyšel v časopise
PLOS Genetics
2020 Číslo 3
- Nový algoritmus zpřesní predikci rizika kardiovaskulárních onemocnění
- Není statin jako statin aneb praktický přehled rozdílů jednotlivých molekul
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
- Jak se válečná Ukrajina stala semeništěm superrezistentních bakterií
- Mohou být časté noční můry předzvěstí demence?
Nejčtenější v tomto čísle
- Evidence of defined temporal expression patterns that lead a gram-negative cell out of dormancy
- A homozygous missense variant in CACNB4 encoding the auxiliary calcium channel beta4 subunit causes a severe neurodevelopmental disorder and impairs channel and non-channel functions
- Correction: Mck1 kinase is a new player in the DNA damage checkpoint pathway
- The Lid/KDM5 histone demethylase complex activates a critical effector of the oocyte-to-zygote transition