Prioritizing sequence variants in conserved non-coding elements in the chicken genome using chCADD
Autoři:
Christian Groß aff001; Chiara Bortoluzzi aff003; Dick de Ridder aff001; Hendrik-Jan Megens aff003; Martien A. M. Groenen aff003; Marcel Reinders aff002; Mirte Bosse aff003
Působiště autorů:
Bioinformatics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
aff001; Delft Bioinformatics Lab, University of Technology Delft, 2600GA, Delft, The Netherlands
aff002; Delft Bioinformatics Lab, University of Technology Delft, 2600 GA, Delft, The Netherlands
aff002; Animal Breeding and Genomics Group, Wageningen University & Research, 6708 PB, Wageningen, The Netherlands
aff003
Vyšlo v časopise:
Prioritizing sequence variants in conserved non-coding elements in the chicken genome using chCADD. PLoS Genet 16(9): e32767. doi:10.1371/journal.pgen.1009027
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pgen.1009027
Souhrn
The availability of genomes for many species has advanced our understanding of the non-protein-coding fraction of the genome. Comparative genomics has proven itself to be an invaluable approach for the systematic, genome-wide identification of conserved non-protein-coding elements (CNEs). However, for many non-mammalian model species, including chicken, our capability to interpret the functional importance of variants overlapping CNEs has been limited by current genomic annotations, which rely on a single information type (e.g. conservation). We here studied CNEs in chicken using a combination of population genomics and comparative genomics. To investigate the functional importance of variants found in CNEs we develop a ch(icken) Combined Annotation-Dependent Depletion (chCADD) model, a variant effect prediction tool first introduced for humans and later on for mouse and pig. We show that 73 Mb of the chicken genome has been conserved across more than 280 million years of vertebrate evolution. The vast majority of the conserved elements are in non-protein-coding regions, which display SNP densities and allele frequency distributions characteristic of genomic regions constrained by purifying selection. By annotating SNPs with the chCADD score we are able to pinpoint specific subregions of the CNEs to be of higher functional importance, as supported by SNPs found in these subregions are associated with known disease genes in humans, mice, and rats. Taken together, our findings indicate that CNEs harbor variants of functional significance that should be object of further investigation along with protein-coding mutations. We therefore anticipate chCADD to be of great use to the scientific community and breeding companies in future functional studies in chicken.
Klíčová slova:
Bird genomics – Genome annotation – Genomics – Chickens – Invertebrate genomics – Mammalian genomics – Sequence alignment – Single nucleotide polymorphisms
Zdroje
1. Consortium IHGS, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062 11237011
2. Consortium EP, et al. The ENCODE (ENCyclopedia of DNA elements) project. Science. 2004;306(5696):636–640. doi: 10.1126/science.1105136 15499007
3. Consortium EP, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. nature. 2007;447(7146):799. doi: 10.1038/nature05874 17571346
4. Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, Siepel A, et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome research. 2007;17(6):760–774. doi: 10.1101/gr.6034307 17567995
5. Alexander RP, Fang G, Rozowsky J, Snyder M, Gerstein MB. Annotating non-coding regions of the genome. Nature Reviews Genetics. 2010;11(8):559–571. doi: 10.1038/nrg2814 20628352
6. Haudry A, Platts AE, Vello E, Hoen DR, Leclercq M, Williamson RJ, et al. An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nature genetics. 2013;45(8):891–898. doi: 10.1038/ng.2684 23817568
7. Alföldi J, Lindblad-Toh K. Comparative genomics as a tool to understand evolution and disease. Genome research. 2013;23(7):1063–1068. doi: 10.1101/gr.157503.113 23817047
8. Craig RJ, Suh A, Wang M, Ellegren H. Natural selection beyond genes: Identification and analyses of evolutionarily conserved elements in the genome of the collared flycatcher (Ficedula albicollis). Molecular ecology. 2018;27(2):476–492. doi: 10.1111/mec.14462 29226517
9. Berr T, Peticca A, Haudry A. Evidence for purifying selection on conserved noncoding elements in the genome of Drosophila melanogaster. bioRxiv. 2019; p. 623744.
10. Harmston N, Barešić A, Lenhard B. The mystery of extreme non-coding conservation. Philosophical Transactions of the Royal Society B: Biological Sciences. 2013;368(1632):20130021. doi: 10.1098/rstb.2013.0021 24218634
11. Braasch I, Gehrke AR, Smith JJ, Kawasaki K, Manousaki T, Pasquier J, et al. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nature genetics. 2016;48(4):427–437. doi: 10.1038/ng.3526 26950095
12. Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478(7370):476–482. doi: 10.1038/nature10530 21993624
13. Halligan DL, Kousathanas A, Ness RW, Harr B, Eöry L, Keane TM, et al. Contributions of protein-coding and regulatory change to adaptive molecular evolution in murid rodents. PLoS genetics. 2013;9(12). doi: 10.1371/journal.pgen.1003995 24339797
14. Hernandez RD, Kelley JL, Elyashiv E, Melton SC, Auton A, McVean G, et al. Classic selective sweeps were rare in recent human evolution. science. 2011;331(6019):920–924. doi: 10.1126/science.1198878 21330547
15. Williamson RJ, Josephs EB, Platts AE, Hazzouri KM, Haudry A, Blanchette M, et al. Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora. PLoS genetics. 2014;10(9). doi: 10.1371/journal.pgen.1004622 25255320
16. Marcovitz A, Jia R, Bejerano G. “Reverse genomics” predicts function of human conserved noncoding elements. Molecular biology and evolution. 2016;33(5):1358–1369. doi: 10.1093/molbev/msw001 26744417
17. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337(6099):1190–1195. doi: 10.1126/science.1222794 22955828
18. Bortoluzzi C, Megens HJ, Bosse M, Derks MF, Dibbits B, Laport K, et al. Parallel genetic origin of foot feathering in birds. Molecular Biology and Evolution. 2020;. doi: 10.1093/molbev/msaa092 32344429
19. Park PJ. ChIP–seq: advantages and challenges of a maturing technology. Nature reviews genetics. 2009;10(10):669–680. doi: 10.1038/nrg2641 19736561
20. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nature genetics. 2014;46(3):310. doi: 10.1038/ng.2892 24487276
21. Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic acids research. 2019;47(D1):D886–D894. doi: 10.1093/nar/gky1016 30371827
22. Zhang G, Li C, Li Q, Li B, Larkin DM, Lee C, et al. Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014;346(6215):1311–1320. doi: 10.1126/science.1251385 25504712
23. Meredith RW, Zhang G, Gilbert MTP, Jarvis ED, Springer MS. Evidence for a single loss of mineralized teeth in the common avian ancestor. Science. 2014;346(6215):1254390. doi: 10.1126/science.1254390 25504730
24. Lovell PV, Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, et al. Conserved syntenic clusters of protein coding genes are missing in birds. Genome biology. 2014;15(12):565. doi: 10.1186/s13059-014-0565-1 25518852
25. Bornelöv S, Seroussi E, Yosefi S, Pendavis K, Burgess SC, Grabherr M, et al. Correspondence on Lovell et al.: identification of chicken genes previously assumed to be evolutionarily lost. Genome biology. 2017;18(1):112. doi: 10.1186/s13059-017-1231-1 28615067
26. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic acids research. 2003;31(13):3812–3814. doi: 10.1093/nar/gkg509 12824425
27. Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Current protocols in human genetics. 2013;76(1):7–20. doi: 10.1002/0471142905.hg0720s76 23315928
28. Choi Y, Chan AP. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics. 2015;31(16):2745–2747. doi: 10.1093/bioinformatics/btv195 25851949
29. Groß C, de Ridder D, Reinders M. Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse. BMC bioinformatics. 2018;19(1):1–10. doi: 10.1186/s12859-018-2337-5 30314430
30. Groß C, Derks M, Megens HJ, Bosse M, Groenen MA, Reinders M, et al. pCADD: SNV prioritisation in Sus scrofa. Genetics Selection Evolution. 2020;52(1):4. doi: 10.1186/s12711-020-0528-9 32033531
31. Bortoluzzi C, Bosse M, Derks MF, Crooijmans RP, Groenen MA, Megens HJ. The type of bottleneck matters: Insights into the deleterious variation landscape of small managed populations. Evolutionary applications. 2020;13(2):330–341. doi: 10.1111/eva.12872 31993080
32. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324 19451168
33. Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015;31(12):2032–2034. doi: 10.1093/bioinformatics/btv098 25697820
34. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 2012. arXiv preprint arXiv:12073907. 2012.
35. Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome research. 2011;21(9):1512–1528. doi: 10.1101/gr.123356.111 21665927
36. Green RE, Braun EL, Armstrong J, Earl D, Nguyen N, Hickey G, et al. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science. 2014;346(6215):1254449. doi: 10.1126/science.1254449 25504731
37. Hickey G, Paten B, Earl D, Zerbino D, Haussler D. HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics. 2013;29(10):1341–1342. doi: 10.1093/bioinformatics/btt128 23505295
38. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome research. 2005;15(8):1034–1050. doi: 10.1101/gr.3715005 16024819
39. Sadri J, Diallo AB, Blanchette M. Predicting site-specific human selective pressure using evolutionary signatures. Bioinformatics. 2011;27(13):i266–i274. doi: 10.1093/bioinformatics/btr241 21685080
40. Miller W, Rosenbloom K, Hardison RC, Hou M, Taylor J, Raney B, et al. 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genome research. 2007;17(12):1797–1808. doi: 10.1101/gr.6761107 17984227
41. Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H, et al. g: Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic acids research. 2019;47(W1):W191–W198. doi: 10.1093/nar/gkz369 31066453
42. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The ensembl variant effect predictor. Genome biology. 2016;17(1):122. doi: 10.1186/s13059-016-0974-4 27268795
43. Drake JA, Bird C, Nemesh J, Thomas DJ, Newton-Cheh C, Reymond A, et al. Conserved noncoding sequences are selectively constrained and not mutation cold spots. Nature genetics. 2006;38(2):223–227. doi: 10.1038/ng1710 16380714
44. Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Blomberg LA, et al. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS biology. 2010;8(9). doi: 10.1371/journal.pbio.1000475 20838655
45. Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Künstner A, et al. The genome of a songbird. Nature. 2010;464(7289):757–762. doi: 10.1038/nature08819 20360741
46. Alföldi J, Di Palma F, Grabherr M, Williams C, Kong L, Mauceli E, et al. The genome of the green anole lizard and a comparative analysis with birds and mammals. Nature. 2011;477(7366):587–591. doi: 10.1038/nature10390 21881562
47. Zhou T, Yang L, Lu Y, et al. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Research. 2013;41:56–62. doi: 10.1093/nar/gkt437 23703209
48. Grantham R. Amino Acid Difference Formula to Help Explain Protein Evolution. Science. 1974;185(4154):862–864. doi: 10.1126/science.185.4154.862 4843792
49. NG P, Henikoff S. Predicting deleterious amino acid substitutions. Genome Research. 2001;11(5):863–874. doi: 10.1101/gr.176601 11337480
50. Foissac S, Djebali S, Munyard K, Vialaneix N, Rau A, Muret K, et al. Multi-species annotation of transcriptome and chromatin structure in domesticated animals. BMC Biology. 2019;17(108):863–874. doi: 10.1186/s12915-019-0726-5 31884969
51. Draper NR, Smith H. Applied regression analysis. vol. 326. John Wiley & Sons; 1998.
52. Lenffer J, Nicholas FW, Castle K, Rao A, Gregory S, Poidinger M, et al. OMIA (Online Mendelian Inheritance in Animals): an enhanced platform and integration into the Entrez search interface at NCBI. Nucleic acids research. 2006;34(suppl_1):D599–D601. doi: 10.1093/nar/gkj152 16381939
53. Zhao H, Sun Z, Wang J, Huang H, Kocher JP, Wang L. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics. 2014;30(7):1006–1007. doi: 10.1093/bioinformatics/btt730 24351709
54. Truong C, Oudre L, Vayatis N. ruptures: change point detection in Python. arXiv preprint arXiv:180100826. 2018.
55. Hillier LW, Miller W, Birney E, Warren W, Hardison RC, Ponting CP, et al. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2014;423(10):695–777. doi: 10.1038/nature03154 15592404
56. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, et al. Ultraconserved elements in the human genome. Science. 2004;304(5675):1321–1325. doi: 10.1126/science.1098119 15131266
57. Casillas S, Barbadilla A, Bergman CM. Purifying selection maintains highly conserved noncoding sequences in Drosophila. Molecular biology and evolution. 2007;24(10):2222–2234. doi: 10.1093/molbev/msm150 17646256
58. Cohen J. Statistical power analysis for the behavioral sciences. Academic press; 2013.
59. Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nature Reviews Genetics. 2011;12(9):628–640. doi: 10.1038/nrg3046 21850043
60. Babarinde IA, Saitou N. Genomic locations of conserved noncoding sequences and their proximal protein-coding genes in mammalian expression dynamics. Molecular biology and evolution. 2016;33(7):1807–1817. doi: 10.1093/molbev/msw058 27017584
61. Polychronopoulos D, King JW, Nash AJ, Tan G, Lenhard B. Conserved non-coding elements: developmental gene regulation meets genome organization. Nucleic acids research. 2017;45(22):12611–12624. doi: 10.1093/nar/gkx1074 29121339
62. Armstrong J, Hickey G, Diekhans M, Deran A, Fang Q, Xie D, et al. Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era. bioRxiv. 2019; p. 730531.
63. Zhang G. The bird’s-eye view on chromosome evolution. Genome biology. 2018;19(1):1–3. doi: 10.1186/s13059-018-1585-z 30470246
64. Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science. 2014;346(6215):1320–1331. doi: 10.1126/science.1253451 25504713
65. Steige KA, Laenen B, Reimegård J, Scofield DG, Slotte T. Genomic analysis reveals major determinants of cis-regulatory variation in Capsella grandiflora. Proceedings of the National Academy of Sciences. 2017;114(5):1087–1092. doi: 10.1073/pnas.1612561114 28096395
66. Nei M, Li WH. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proceedings of the National Academy of Sciences. 1979;76(10):5269–5273. doi: 10.1073/pnas.76.10.5269 291943
67. Watterson G. On the number of segregating sites in genetical models without recombination. Theoretical population biology. 1975;7(2):256–276. doi: 10.1016/0040-5809(75)90020-9 1145509
Článek vyšel v časopise
PLOS Genetics
2020 Číslo 9
- Může hubnutí souviset s vyšším rizikem nádorových onemocnění?
- Raději si zajděte na oční! Jak souvisí citlivost zraku s rozvojem demence?
- Co způsobuje pooperační infekce? Na vině může být i naše vlastní mikrobiota
- Čeká nás průlom v diagnostice karcinomu pankreatu?
- Polibek, který mi „vzal nohy“ aneb vzácný výskyt EBV u 70leté ženy – kazuistika
Nejčtenější v tomto čísle
- Alleviating chronic ER stress by p38-Ire1-Xbp1 pathway and insulin-associated autophagy in C. elegans neurons
- Cocoonase is indispensable for Lepidoptera insects breaking the sealed cocoon
- A mega-analysis of expression quantitative trait loci in retinal tissue
- Adiponectin GWAS loci harboring extensive allelic heterogeneity exhibit distinct molecular consequences