A general framework for functionally informed set-based analysis: Application to a large-scale colorectal cancer study
Autoři:
Xinyuan Dong aff001; Yu-Ru Su aff001; Richard Barfield aff001; Stephanie A. Bien aff001; Qianchuan He aff001; Tabitha A. Harrison aff001; Jeroen R. Huyghe aff001; Temitope O. Keku aff003; Noralane M. Lindor aff004; Clemens Schafmayer aff005; Andrew T. Chan aff006; Stephen B. Gruber aff007; Mark A. Jenkins aff008; Charles Kooperberg aff001; Ulrike Peters aff001; Li Hsu aff001
Působiště autorů:
Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
aff001; Department of Biostatistics, University of Washington, Seattle, WA, USA
aff002; Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, North Carolina, USA
aff003; Department of Health Science Research, Mayo Clinic, Scottsdale, Arizona, USA
aff004; Department of General Surgery, University Hospital Rostock, Rostock, Germany
aff005; Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, and Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
aff006; City of Hope National Medical Center, Duarte, and Department of Preventive Medicine & USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
aff007; Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Victoria, Australia
aff008
Vyšlo v časopise:
A general framework for functionally informed set-based analysis: Application to a large-scale colorectal cancer study. PLoS Genet 16(8): e32767. doi:10.1371/journal.pgen.1008947
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pgen.1008947
Souhrn
Genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with various phenotypes, but together they explain only a fraction of heritability, suggesting many variants have yet to be discovered. Recently it has been recognized that incorporating functional information of genetic variants can improve power for identifying novel loci. For example, S-PrediXcan and TWAS tested the association of predicted gene expression with phenotypes based on GWAS summary statistics by leveraging the information on genetic regulation of gene expression and found many novel loci. However, as genetic variants may have effects on more than one gene and through different mechanisms, these methods likely only capture part of the total effects of these variants. In this paper, we propose a summary statistics-based mixed effects score test (sMiST) that tests for the total effect of both the effect of the mediator by imputing genetically predicted gene expression, like S-PrediXcan and TWAS, and the direct effects of individual variants. It allows for multiple functional annotations and multiple genetically predicted mediators. It can also perform conditional association analysis while adjusting for other genetic variants (e.g., known loci for the phenotype). Extensive simulation and real data analyses demonstrate that sMiST yields p-values that agree well with those obtained from individual level data but with substantively improved computational speed. Importantly, a broad application of sMiST to GWAS is possible, as only summary statistics of genetic variant associations are required. We apply sMiST to a large-scale GWAS of colorectal cancer using summary statistics from ∼120, 000 study participants and gene expression data from the Genotype-Tissue Expression (GTEx) project. We identify several novel and secondary independent genetic loci.
Klíčová slova:
Colorectal cancer – Covariance – Gene expression – Gene prediction – Genetic loci – Genetics – Genome-wide association studies – Test statistics
Zdroje
1. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic acids research. 2019;47(D1):D1005–D1012. doi: 10.1093/nar/gky1120 30445434
2. Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, et al. The genotype-tissue expression (GTEx) project. Nature genetics. 2013;45(6):580. doi: 10.1038/ng.2653
3. Consortium EP, et al. The ENCODE (ENCyclopedia of DNA elements) project. Science. 2004;306(5696):636–640. doi: 10.1126/science.1105136
4. Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, et al. A gene-based association method for mapping traits using reference transcriptome data. Nature genetics. 2015;47(9):1091. doi: 10.1038/ng.3367 26258848
5. Barbeira AN, Dickinson SP, Bonazzola R, Zheng J, Wheeler HE, Torres JM, et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nature communications. 2018;9(1):1825. doi: 10.1038/s41467-018-03621-1 29739930
6. Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BW, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nature genetics. 2016;48(3):245. doi: 10.1038/ng.3506 26854917
7. Burgess S, Dudbridge F, Thompson SG. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Statistics in medicine. 2016;35(11):1880–1906. doi: 10.1002/sim.6835
8. Barfield R, Feng H, Gusev A, Wu L, Zheng W, Pasaniuc B, et al. Transcriptome-wide association studies accounting for colocalization using Egger regression. Genetic epidemiology. 2018;42(5):418–433. doi: 10.1002/gepi.22131 29808603
9. Corradin O, Saiakhova A, Akhtar-Zaidi B, Myeroff L, Willis J, Cowper-Sal R, et al. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome research. 2014;24(1):1–13. doi: 10.1101/gr.164079.113 24196873
10. Ghoussaini M, French JD, Michailidou K, Nord S, Beesley J, Canisus S, et al. Evidence that the 5p12 variant rs10941679 confers susceptibility to estrogen-receptor-positive breast cancer through FGF10 and MRPS30 regulation. The American Journal of Human Genetics. 2016;99(4):903–911. doi: 10.1016/j.ajhg.2016.07.017 27640304
11. Sun J, Zheng Y, Hsu L. A unified mixed-effects model for rare-variant association in sequencing studies. Genetic epidemiology. 2013;37(4):334–344. doi: 10.1002/gepi.21717
12. Su YR, Di C, Bien S, Huang L, Dong X, Abecasis G, et al. A mixed-effects model for powerful association tests in integrative functional genomics. The American Journal of Human Genetics. 2018;102(5):904–919. doi: 10.1016/j.ajhg.2018.03.019 29727690
13. Burgess S, Zuber V, Valdes-Marquez E, Sun BB, Hopewell JC. Mendelian randomization with fine-mapped genetic data: Choosing from large numbers of correlated instrumental variables. Genetic epidemiology. 2017;41(8):714–725. doi: 10.1002/gepi.22077
14. Huang YT, VanderWeele TJ, Lin X. Joint analysis of SNP and gene expression data in genetic association studies of complex diseases. The annals of applied statistics. 2014;8(1):352. doi: 10.1214/13-AOAS690
15. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC; 2006.
16. Huyghe JR, Bien SA, Harrison TA, Kang HM, Chen S, Schmit SL, et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nature genetics. 2019;51(1):76. doi: 10.1038/s41588-018-0286-6 30510241
17. Knight K, Fu W, et al. Asymptotics for lasso-type estimators. The Annals of statistics. 2000;28(5):1356–1378. doi: 10.1214/aos/1015957397
18. Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature genetics. 2016;48(5):481. doi: 10.1038/ng.3538 27019110
19. Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS genetics. 2014;10(5):e1004383. doi: 10.1371/journal.pgen.1004383 24830394
20. Hormozdiari F, Van De Bunt M, Segre AV, Li X, Joo JWJ, Bilow M, et al. Colocalization of GWAS and eQTL signals detects target genes. The American Journal of Human Genetics. 2016;99(6):1245–1260. doi: 10.1016/j.ajhg.2016.10.003 27866706
21. Wen X, Pique-Regi R, Luca F. Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization. PLoS genetics. 2017;13(3):e1006646. doi: 10.1371/journal.pgen.1006646
22. MacKinnon DP, Krull JL, Lockwood CM. Equivalence of the mediation, confounding and suppression effect. Prevention science. 2000;1(4):173–181. doi: 10.1023/A:1026595011371
23. Sobel ME. Asymptotic confidence intervals for indirect effects in structural equation models. Sociological methodology. 1982;13:290–312. doi: 10.2307/270723
24. Guo S, Ran H, Xiao D, Huang H, Mi L, Wang X, et al. NT5DC2 promotes tumorigenicity of glioma stem-like cells by upregulating fyn. Cancer letters. 2019;454:98–107. doi: 10.1016/j.canlet.2019.04.003 30978441
25. Alahari SK. Nischarin inhibits Rac induced migration and invasion of epithelial cells by affecting signaling cascades involving PAK. Experimental cell research. 2003;288(2):415–424. doi: 10.1016/S0014-4827(03)00233-7
26. Karasneh J, Gül A, Ollier WE, Silman AJ, Worthington J. Whole-genome screening for susceptibility genes in multicase families with Behçet’s disease. Arthritis & Rheumatism. 2005;52(6):1836–1842. doi: 10.1002/art.21060
27. Larsen JE, Pavey SJ, Passmore LH, Bowman RV, Hayward NK, Fong KM. Gene expression signature predicts recurrence in lung adenocarcinoma. Clinical Cancer Research. 2007;13(10):2946–2954. doi: 10.1158/1078-0432.CCR-06-2525
28. Choi SY, Huang P, Jenkins GM, Chan DC, Schiller J, Frohman MA. A common lipid links Mfn-mediated mitochondrial fusion and SNARE-regulated exocytosis. Nature cell biology. 2006;8(11):1255. doi: 10.1038/ncb1487
29. Steinhardt AA, Gayyed MF, Klein AP, Dong J, Maitra A, Pan D, et al. Expression of Yes-associated protein in common solid tumors. Human pathology. 2008;39(11):1582–1589. doi: 10.1016/j.humpath.2008.04.012 18703216
30. Schwarz-Romond T, Asbrand C, Bakkers J, Kühl M, Schaeffer HJ, Huelsken J, et al. The ankyrin repeat protein Diversin recruits Casein kinase Iε to the β-catenin degradation complex and acts in both canonical Wnt and Wnt/JNK signaling. Genes & development. 2002;16(16):2073–2084. doi: 10.1101/gad.230402
31. Lee S, Wu MC, Lin X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics. 2012;13(4):762–775. doi: 10.1093/biostatistics/kxs014
32. Hu YJ, Berndt SI, Gustafsson S, Ganna A, Mägi R, Wheeler E, et al. Meta-analysis of gene-level associations for rare variants based on single-variant statistics. The American Journal of Human Genetics. 2013;93(2):236–248. doi: 10.1016/j.ajhg.2013.06.011 23891470
33. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nature genetics. 2016;48(10):1279. doi: 10.1038/ng.3643 27548312
34. Battle A, Mostafavi S, Zhu X, Potash JB, Weissman MM, McCormick C, et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome research. 2014;24(1):14–24. doi: 10.1101/gr.155192.113 24092820
Článek vyšel v časopise
PLOS Genetics
2020 Číslo 8
- Může hubnutí souviset s vyšším rizikem nádorových onemocnění?
- Polibek, který mi „vzal nohy“ aneb vzácný výskyt EBV u 70leté ženy – kazuistika
- AI může chirurgům poskytnout cenná data i zpětnou vazbu v reálném čase
- Antibiotika na nachlazení nezabírají! Jak můžeme zpomalit šíření rezistence?
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
Nejčtenější v tomto čísle
- Genomic imprinting: An epigenetic regulatory system
- Uptake of exogenous serine is important to maintain sphingolipid homeostasis in Saccharomyces cerevisiae
- A human-specific VNTR in the TRIB3 promoter causes gene expression variation between individuals
- Immediate activation of chemosensory neuron gene expression by bacterial metabolites is selectively induced by distinct cyclic GMP-dependent pathways in Caenorhabditis elegans