Minority-centric meta-analyses of blood lipid levels identify novel loci in the Population Architecture using Genomics and Epidemiology (PAGE) study

Authors: Yao Hu ^aff001; Mariaelisa Graff ^aff002; Jeffrey Haessler ^aff001; Steven Buyske ^aff003; Stephanie A. Bien ^aff001; Ran Tao ^aff004; Heather M. Highland ^aff002; Katherine K. Nishimura ^aff001; Niha Zubair ^aff001; Yingchang Lu ^aff006; Marie Verbanck ^aff006; Austin T. Hilliard ^aff007; Derek Klarin ^aff008; Scott M. Damrauer ^aff011; Yuk-Lam Ho ^aff014; ; Peter W. F. Wilson ^aff011; Kyong-Mi Chang ^aff012; Philip S. Tsao ^aff017; Kelly Cho ^aff014; Christopher J. O’Donnell ^aff014; Themistocles L. Assimes ^aff017; Lauren E. Petty ^aff005; Jennifer E. Below ^aff005; Ozan Dikilitas ^aff021; Daniel J. Schaid ^aff022; Matthew L. Kosel ^aff022; Iftikhar J. Kullo ^aff021; Laura J. Rasmussen-Torvik ^aff023; Gail P. Jarvik ^aff024; Qiping Feng ^aff025; Wei-Qi Wei ^aff025; Eric B. Larson ^aff026; Frank D. Mentch ^aff027; Berta Almoguera ^aff027; Patrick M. Sleiman ^aff027; Laura M. Raffield ^aff028; Adolfo Correa ^aff029; Lisa W. Martin ^aff030; Martha Daviglus ^aff031; Tara C. Matise ^aff003; Jose Luis Ambite ^aff033; Christopher S. Carlson ^aff001; Ron Do ^aff006; Ruth J. F. Loos ^aff006; Lynne R. Wilkens ^aff034; Loic Le Marchand ^aff034; Chris Haiman ^aff035; Daniel O. Stram ^aff035; Lucia A. Hindorff ^aff036; Kari E. North ^aff002; Charles Kooperberg ^aff001; Iona Cheng ^aff037; Ulrike Peters ^aff001
Authors place of work: Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America ^aff001; Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America ^aff002; Department of Statistics and Biostatistics, Rutgers University, New Brunswick, New Jersey, United States of America ^aff003; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America ^aff004; The Vanderbilt Genetics Institute, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America ^aff005; The Charles Bronfman Institute for Personalized Medicine, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America ^aff006; Palo Alto Veterans Institute for Research, VA Palo Alto Health Care System, Palo Alto, California, United States of America ^aff007; Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America ^aff008; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America ^aff009; Boston VA Healthcare System, Boston, Massachusetts, United States of America ^aff010; Emory Clinical Cardiovascular Research Institute, Atlanta, Georgia, United States of America ^aff011; Corporal Michael Crescenz VA Medical Center, Philadelphia, Pennsylvania, United States of America ^aff012; Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America ^aff013; Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, Massachusetts, United States of America ^aff014; Atlanta VA Medical Center, Decatur, Georgia, United States of America ^aff015; Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America ^aff016; Department of Medicine, Stanford University School of Medicine, Stanford, California, United States of America ^aff017; VA Palo Alto Health Care System, Palo Alto, California, United States of America ^aff018; Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America ^aff019; Department of Epidemiology, Human Genetics & Environmental Sciences, University of Texas School of Public Health, Houston, Texas, United States of America ^aff020; Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota, United states of America ^aff021; Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America ^aff022; Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America ^aff023; Department of Medicine, University of Washington Medical Center, Seattle, Washington, United States of America ^aff024; Department of Medicine, Division of Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America ^aff025; Kaiser Permanente Washington Health Research Institute, Seattle, Washington, United States of America ^aff026; Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America ^aff027; Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America ^aff028; Departments of Medicine, Pediatrics, and Population Health Science, University of Mississippi Medical Center, Jackson, Mississippi, United States of America ^aff029; School of Medicine and Health Sciences, George Washington University, Washington, District of Columbia, United States of America ^aff030; Institute for Minority Health Research, University of Illinois at Chicago, Chicago, Illinois, United States of America ^aff031; Department of Medicine, University of Illinois at Chicago, Chicago, Illinois, United States of America ^aff032; Information Sciences Institute, University of Southern California, Marina del Rey, California, United States of America ^aff033; Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America ^aff034; Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America ^aff035; Division of Genomic Medicine, NIH National Human Genome Research Institute, Bethesda, Maryland, United States of America ^aff036; Cancer Prevention Institute of California, Fremont, California, United States of America ^aff037
Published in the journal: Minority-centric meta-analyses of blood lipid levels identify novel loci in the Population Architecture using Genomics and Epidemiology (PAGE) study. PLoS Genet 16(3): e32767. doi:10.1371/journal.pgen.1008684
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1008684

Summary

Lipid levels are important markers for the development of cardio-metabolic diseases. Although hundreds of associated loci have been identified through genetic association studies, the contribution of genetic factors to variation in lipids is not fully understood, particularly in U.S. minority groups. We performed genome-wide association analyses for four lipid traits in over 45,000 ancestrally diverse participants from the Population Architecture using Genomics and Epidemiology (PAGE) Study, followed by a meta-analysis with several European ancestry studies. We identified nine novel lipid loci, five of which showed evidence of replication in independent studies. Furthermore, we discovered one novel gene in a PrediXcan analysis, minority-specific independent signals at eight previously reported loci, and potential functional variants at two known loci through fine-mapping. Systematic examination of known lipid loci revealed smaller effect estimates in African American and Hispanic ancestry populations than those in Europeans, and better performance of polygenic risk scores based on minority-specific effect estimates. Our findings provide new insight into the genetic architecture of lipid traits and highlight the importance of conducting genetic studies in diverse populations in the era of precision medicine.

Keywords:

Genetic loci – Europe – Population genetics – Genome-wide association studies – Metaanalysis – Lipids – Hispanic people – Trait locus analysis

Introduction

Circulating levels of lipids such as high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), total cholesterol (TC), and triglycerides (TG), are associated with atherosclerotic cardiovascular disease, type 2 diabetes, and fatty liver disease [1–3]. Plasma lipid levels are heritable polygenic traits, with twin studies estimating narrow-sense heritability from 0.48 to 0.76 [4]. Genetic association studies have identified over 400 loci associated with lipid traits [5–10, 13]. However, the majority of these findings were based on European ancestry populations, and African American (AA), Hispanic, and other minority populations are underrepresented in these studies.

Previous studies have demonstrated distinct lipid profiles in minority populations compared to Europeans, with higher HDL-C and lower TG levels in African ancestry populations and lower levels of HDL-C and TC in Hispanics [11, 12]. In addition, ancestry-specific variants at established lipid loci have been identified in AA and Hispanic ancestry populations, which were monomorphic or with extremely low minor allele frequencies (MAFs) in populations of European descent [13]. The phenotypic variance explained by the established loci is considerably lower in American minority (8.8–12.3%) than in European ancestry populations (12.9–27.8%) [7, 13], demonstrating that discovery and fine-mapping in non-European populations has fallen behind and that focused efforts in these populations are needed.

The Population Architecture using Genomics and Epidemiology (PAGE) Study funded by the National Human Genome Research Institute and the National Institute on Minority Health and Health Disparities was designed to characterize the genetic architecture of complex traits among underrepresented minority populations through large-scale genetic epidemiological research [14]. As part of this initiative, we developed the Multiethnic Genotyping Array (MEGA) to improve fine-mapping and discovery by increasing variant coverage across multiple ethnicities [15]. Using this array in PAGE enabled us to perform genomic analyses to identify lipid loci that may have been missed by previous Euro-centric GWAS, and to explore the generalizability and heterogeneity of the previous findings across major U.S. ethnic groups. We also performed a meta-analysis combining PAGE results with those from the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium [16] and other available European GWAS to search for additional novel lipid loci, and sought replication of our new findings in the Million Veteran Program (MVP) [13], the Global Hispanic Lipids Consortium, and the Electronic Medical Records and Genomics (eMERGE) Network [17], the Kaiser Permanente Research Bank [18], the Jackson Heart Study (JHS) [19], and the UK BioBank (UKBB, https://www.ukbiobank.ac.uk/).

Results

Study overview

Data on 45,698 participants were included in the minority meta-analysis with 17,641 AAs, 22,830 Hispanics, 2,387 East Asians, 1,912 Native Hawaiians, 604 Native Americans and 333 others (primarily South Asian, mixed heritage, and other racial/ethnic groups) [20]. These participants were drawn from six large well-characterized epidemiological studies: the Atherosclerosis Risk in Communities Study (ARIC), the BioME Biobank (BioMe), the Coronary Artery Risk Development in Young Adults Study (CARDIA), the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), the Multiethnic Cohort Study (MEC), and the Women’s Health Initiative (WHI). An additional 22,887 European ancestry participants from ARIC, BioMe, CARDIA, and WHI with individual level data in PAGE were included in the minority plus European meta-analyses along with publicly available summary statistics from ENGAGE in over 62,000 participants (http://diagram-consortium.org/2015_ENGAGE_1KG/) [16]. Across the five major ancestral groups in PAGE, the lowest HDL-C level was observed in Native Hawaiians, the highest LDL-C level was observed in Europeans, and the highest TC and TG levels were observed in Native Americans (Table 1 and S1 Table).

Characteristics of the ancestrally diverse populations in PAGE <em class="ref"><sup>1</sup></em>. — **Tab. 1. Characteristics of the ancestrally diverse populations in PAGE ¹.**

Identification of novel loci

In the first discovery stage, we performed a minority-centric analysis that included 45,698 non-European ancestry participants in PAGE and conducted a fixed-effect inverse-variance-weighted meta-analysis (S1 Fig). We identified four novel loci for HDL-C (5q31-rs17102282, DLC1-rs11782435, ZCCHC6-rs145312881 and DDHD1-rs75405126) and one novel locus for TG (MTHFD2-rs182013227) (Table 2 and S2 Fig). These five novel loci remained genome-wide significant after adjustment for previously established variants on the same chromosome (P_condition<5.0E-8), and none of them exhibited significant evidence of heterogeneity across studies (S2 Table). Ancestry-stratified analysis revealed which ancestral population contributed most to these novel loci (S3 Table). MTHFD2-rs182013227 is only polymorphic in AAs (MAF = 0.004). DDHD1-rs75405126 is monomorphic in AA and Hispanic populations and was mainly driven by the signal in Native Hawaiians (MAF = 0.008, P_Hawaiian = 2.7E-7, S3 Table). In addition to the five novel loci from the minority-combined meta-analysis, we discovered one novel TC locus, PCSK1-rs903381, in the Hispanic-specific meta-analysis (Table 2 and S2 Fig).

**Tab. 2. Novel loci identified in the discovery stage.**

In the second discovery stage, we performed a minority plus European ancestry meta-analysis in over 131,000 participants (S1 Fig). We identified three additional novel loci (HLF-rs12940636 for HDL-C, B4GALNT3-rs35882350 for LDL-C and TC, and GPCPD1-rs3747910/rs199986018 for LDL-C/TC), which remained genome-wide significant after adjusting for all established variants (P_condition<5.0E-8, Table 2 and S4 Table). The lead variants at GPCPD1 locus are highly correlated (r² = 0.97). All these loci exhibited evidence of association in both minority and European ancestry populations, showing no significant evidence of heterogeneity across studies (S4 Table) and were common across ancestral groups (Table 2). Of the five novel loci that we identified in the PAGE minority meta-analysis, the association signals were either attenuated (5q31, DLC1 and DDHD1) or the variants were not available in European ancestry populations due to low MAF (i.e. ZCCHC6) or monomorphism (i.e. PCSK1 and MTHFD2).

Next, we sought replication for our nine newly discovered loci listed in Table 2 in MVP, the Global Hispanic Lipids Consortium, eMERGE, Kaiser, JHS, and UKBB. Details about these replication studies are provided in the S1 Text. For each novel locus, study-specific results were combined through sample-size-weighted meta-analyses. Three novel loci were successfully replicated (HLF, B4GALNT3, and GPCPD1, P _{discovery+replication} <5.0E-8), and two additional novel loci exhibited suggestive evidence of replication (5q31 and DLC1, P_replication<0.05), with smaller effect estimates in the replication results (S5 Table). The failure to replicate the DDHD1 locus, which was driven by Native Hawaiian signals, may resulted from the absence of ancestry-matched replication studies.

PrediXcan

We performed a PrediXcan analysis to identify associations between lipid traits and the heritable component of gene expression (GREx) in liver, adipose tissue, and whole blood. GREx of SCN11A, which does not map to any known regions, showed significant association with TG in visceral adipose tissue (P = 1.4E-6, S6 Table). SCN11A gene encodes a member of the voltage-gated sodium channel alpha subunit and is responsible for the generation and propagation of action potentials in neurons and muscles [21]. Replication is needed to confirm this novel finding. In addition, we identified 37 genes mapped to 19 previously reported loci that exhibited significant associations with at least one of the four lipid traits (P<2.0E-6, S6 Table). Among the 19 previously reported loci, 16 of them corresponded to expected biological candidate gene(s) while long intergenic noncoding RNAs (AC067959.1 near APOB, and AP000770.1 and AP006216.11 near APOA5) and pseudogene (HNRNPA1P10 near DOCK6/ANGPTL8) reached significance at the other three loci (S6 Table). Failure to detect associations with well-established candidate genes for some known loci likely reflected, at least in part, attenuation in statistical power as a result of small reference sample sizes or from applying weights derived from primarily European reference transcriptomes to minority populations [22]. Identification of associations with genes other than the candidate genes in these regions likely reflected co-regulation of variant predictors and correlation of gene expressions.

Evaluation of previously established loci in PAGE minority ancestry populations

We focused the evaluation of 433 previously reported loci in 33,063 minority ancestry participants uniformly genotyped by the MEGA array because of its higher coverage of variants of diverse ancestral groups and better imputation quality [15]. Associations of 276 and 839 unique variants in 30 and 240 previously established loci with at least one lipid trait were identified at P≤5.0E-8 and P≤0.05 level, respectively (S7 Table). Estimation of allelic heterogeneity of the established loci through inclusion of the SNP by principal component (SNP×PC) interaction term in the main model in MEGA minority populations revealed significant heterogeneity for 31 variants at six replicated loci (CETP, TOMM40, TRIB1, BUD13, GCKR and GFOD2) after Bonferroni correction (P_SNP×PC≤1.8E-4, 0.05/276 SNPs, S7 Table).

We then evaluated overall trends of the strength of the effect estimates at all known GWAS loci reported by the Global Lipids Genetic Consortium (GLGC) [7] across AA, Hispanic, and European ancestry populations. For all four lipid traits, the effect estimates from Hispanic ancestry populations showed stronger correlation with those from European ancestry populations than the effect estimates from AA ancestry populations (Fig 1). The phenotypic variance explained by these reported variants ranged from 13.1%-23.9%, 12.5%-20.7%, 11.6%-17.0%, and 8.89%-19.6% for HDL-C, LDL-C, TC and TG, respectively, across AA, Hispanic, and European ancestry populations, with the lowest variance explained seen in AA populations (S8 Table).

**Fig. 1. Comparison of effect estimates across AA, Hispanic and European ancestry populations.**

We explored independent signals in our minority ancestry populations at previously known loci through step-wise conditional analysis adjusting for the most significant hit at each round. We identified independent signals at 12 known loci, and eight of them harbored variants that were monomorphic in European ancestry populations (ABCA1, APOA5, CETP, LCAT, PCSK9, LDLR, APOE and TM6SF2, S9 Table). Among these eight loci, six of them harbored either missense (ABCA1-rs9282541, APOA5-rs142953140/rs147210663, LCAT-rs35673026, PCSK9-rs28362263, TM6SF2-rs142056540 and APOE-rs769455) or loss of function (LoF) variants (PCSK9-rs28362286/rs67608943, S9 Table).

Next, we performed fine-mapping of previously reported loci leveraging the relatively shorter LD ranges in AA populations and using FINEMAP [23]. In each locus, the number of variants in strong LD (r²≥0.6) with the lead variant in AA participants genotyped on MEGA were calculated using European- and AA-specific LD matrices from the 1000 Genome Phase 3 data. Among the 11 reported loci that showed genome-wide significance with at least one lipid trait in MEGA AA populations, all loci showed a reduction in the number of SNPs except for APOE and LDLR associated with HDL-C (S10 Table). The most substantial refinement was observed at APOA5 locus associated with TG, where the number of variants that were highly correlated with the lead variant rs3135506 reduced from 49 (based on European-specific LD) to zero (based on AA-specific LD, S10 Table). This observation is consistent with the findings in our published paper using Metabochip data [24]. We then calculated the 99% credible sets for the loci reported by GLGC [6] using FINEMAP. We calculated the credible sets based on GLGC-only data and GLGC plus PAGE minority combined data. After adding PAGE minority participants, 67% of the known loci showed a reduced number of SNPs in the 99% credible sets or reduced length of the 99% credible sets, and 59% of the known loci exhibited over 10% reduction. The most substantially fine-mapped region was the VLDLR region, for which the number of SNPs included in the credible set was reduced from over 2000 to one (rs3780181); recently published evidence supports it as the best candidate functional variant at this locus [25]. In addition, the only SNP in the 99% credible sets at CETP and SORT1 loci after including the PAGE minority data was the top hit in the AA- and Hispanic-specific analyses (CETP-rs183130 and SORT1-rs12740374) rather than the top hit in GLGC (S11 Table). In fine-map analyses of the nine novel loci, 99% credible sets were estimated based on minority meta-analysis results using FINEMAP; the variants included in each credible set are shown in S12 Table.

We further built multiple constructs of weighted polygenic risk scores (PRSs) based on previously reported loci in European ancestry populations for each lipid trait and implemented 10-fold cross validation to evaluate their performance in minority populations. Reported loci from GLGC [6] were used and three PRS constructs were evaluated: (1) PRS1 using reported variants, weighted by the reported effect estimates from GLGC; (2) PRS2 using reported variants from GLGC, weighted by effect estimates observed in PAGE; (3) PRS3 using the most significantly associated variant from PAGE at each reported locus (see Methods section), weighted by PAGE effect estimates. For all four lipid traits, PRS2 explained more variance compared to PRS1 in terms of lower residual values (S13 Table). PRS3 explained more variance of TG compared to PRS2 while no improvement was observed for HDL-C, LDL-C or TC (S13 Table).

Functional annotation of the novel loci

Bioinformatic follow-up of the novel loci listed in Table 2 was performed using a comprehensive annotation database constructed from whole genome sequence annotator (WGSA) [26] and a custom UCSC analysis hub visualizing enhancer and repressor activities, DNase I hypersensitive sites (DHS) and transcribed regions in adult liver and adipose tissue, which facilitated prioritization of putative functional genes and variants. At the HLF locus, the index SNP is located at the 3’UTR region of HLF (encodes a member of the proline and acidic-rich protein family, a subset of the bZIP transcription factors) and shows a DANN rank score of 0.90 (≥0.9, possibly deleterious) [27]. In vitro experiments demonstrated reduced cellular lipid content after knockdown of HLF [28]. Of note, the TOM1L1 gene is located ~400kb away from the index SNP, which belongs to the same family of the previously reported TOM1 gene (S3 Fig). At the B4GALNT3 locus, which was associated with both LDL-C and TC levels, an LD proxy (rs34019521, r² = 0.85) of the lead variant overlapped with enhancer activity in liver and showed an Eigen PC phred score of 22.11 (≥17, functional) [29]. This variant was part of the 99% credible set of this locus in the fine-map analysis (S12 Table). At each of the nine novel locus, the index SNP and its proxies (r²≥0.4) with a DANN rank score ≥0.9 (deleterious), Eigen PC phred score ≥17 (functional) [29], eQTL in GTEx database [30], or overlapped with enhancer, repressor, DHS and transcribed regions are summarized in S14 Table. Visualization of novel loci using our custom UCSC analysis hub are presented in S3 Fig.

Discussion

Employing a two stage meta-analysis, first in our ancestrally diverse minority populations, then combining with populations of European descent, we identified nine novel loci, five of which showed evidence of replication in independent studies. We further identified a novel gene SCN11A, and strong candidate target genes in previously known loci using the PrediXcan approach. Independent minority-specific signals at eight previously established loci were identified in the conditional analysis using individual level data. The systematic evaluation of previously reported loci in PAGE revealed shared genetic background across ethnicities and heterogeneity of allelic effects between minority and European ancestry populations.

Our findings demonstrated three benefits of performing GWAS for lipid traits in ancestrally diverse populations focusing on non-European participants. The first benefit lies in the refinement of previously established loci in terms of more accurate estimation of the effect sizes in non-Europeans. In the examination of reported loci using PRS, we demonstrated that PRSs which were weighted on minority-specific effect sizes explained more variance of all four lipid traits in our non-European populations than the ones weighted on effect sizes reported by GLGC, which focused on participants of European descent. PRSs which used the top hit at each locus from the minority-specific results and weighted on their effect sizes explained even more variance of TG level, although no improvement was observed for the other three lipid traits, which may be the result of insufficient samples sizes in our current analyses. In the comparison of effect estimates for each single variant across ancestral groups, we discovered an attenuation in the magnitude of effects, especially in AA participants, which reflected less European admixture in AA compared to Hispanic populations.

The second benefit is related to the potential of pinpointing the functional variants. Our minority population enabled us to narrow in on a potential functional variant rs3135506 at the APOA5 locus associated with TG through LD-based fine-mapping. This index SNP is a missense variant (p.Ser19Trp) with a DANN rank score of 0.995 (deleterious), and exhibits suggestive evidence of association with coronary artery disease (P = 7.8E-4) [31]. In a previous exome chip analysis of Norwegian participants with a smaller sample size, this missense variant was no longer significant after accounting for the GWAS index SNP rs964184 [32]. However, we discovered that rs3135506 was much more significant than rs964184 in association with TG (P = 1.00E-23 and 1.5E-4, respectively) and the association result of rs3135506 remained almost unchanged after conditioning on rs964184 (P = 1.0E-23 and 2.6E-20 before and after conditional analysis, respectively) in our AA populations. These results reflected different LD patterns across ancestral groups (r² = 0.02 and 0.37 in African and European ancestry population from 1000 Genome Phase 3 data, respectively) and emphasized that extra caution would be needed when generalizing results across ancestral populations. Another example is the successful refinement of the VLDLR locus, with only one variant rs3780181 in the combined 99% credible set. This variant was recently reported for regulating enhancer activity and VLDLR gene expression, supporting it as the best candidate functional variant for this signal [25].

The third benefit was the identification of ancestry-specific variants both at novel and previously reported loci. Among the five novel loci identified in the minority meta-analysis, four of them were either monomorphic (PCSK1 and MTHFD2) or had extremely low MAFs (DDHD1 and ZCCHC6) in European ancestry populations. At the DDHD1 locus, the novel association was mainly driven by signal from Native Hawaiian ancestry populations that have not been examined before. The minor allele T of rs75405126 is more common in Native Hawaiians than in Europeans (MAF = 0.008 and 0.001, respectively). It also shows fairly high MAF in Asians (0.007), and it is monomorphic in AA and Hispanic ancestry populations. A previous study has shown that expression of DDHD1 in leukocyte was correlated with plasma HDL-C level [33]. Nevertheless, further investigation and successful replication of these ancestry-specific novel loci are needed. In addition, the availability of individual level data for all minority participants in PAGE enabled us to perform more accurate conditional analyses, leading to identification of minority-specific independent signals at eight previously reported loci. These signals were monomorphic in European ancestry populations, and six of them harbored either missense or LoF variants, which could be potential targets in future pharmaceutical studies (S9 Table). These findings, that may have been missed by the European-focused GWAS, contribute to a more complete picture of the genetic architecture of lipid metabolism.

Despite the heterogeneity and disparity we observed in our analyses, the evidence of the shared genetic architecture of the four lipid traits among different ancestral groups was overwhelming. Among the independent and significant variants reported by GLGC [7], 21%, 18%, 20% and 25% of them reached genome-wide significance in the minority meta-analysis for HDL-C, LDL-C, TC and TG, respectively, with consistent association directions between minority and European ancestry populations. The total sample size of our minority population was only 15% of that in the GLGC discovery stage, and undoubtedly many more loci will surpass the genome-wide significant threshold with an increased sample size. In addition, the two novel loci (B4GALNT3 and GPCPD1) we identified in the minority plus European ancestry meta-analysis, which showed successful replication, were jointly driven by signals in minority and European ancestry populations. B4GALNT3 (beta-1,4-N-acetyl-galactosaminyltransferase 3) and the previously reported GALNT2 gene both belong to the N-acetylgalactosaminyltransferases family [34]. GPCPD1 encodes glycerophosphocholine phosphodiesterase 1, and knockdown of this gene resulted in altered lipid metabolites in cells [35].

Although we represented the most ancestrally diverse minority dataset for lipid traits to date, several limitations need to be mentioned. First, the sample sizes for the Native Hawaiian and Native American ancestry populations were limited, which may have obstructed the discovery of novel loci in these two ancestral groups. Second, we were not well-powered to systematically examine the potentially different effects of lipid traits-associated loci between male and female participants in our minority ancestry populations, with previous studies in European ancestry populations reporting a global difference in SNP effects between sexes [9]. Third, we may not be able to capture potential heterogeneity within ancestral groups. For example, the Hispanic participants in the current analysis came from various regions including Central and South America, Puerto Rico, and Mexico, and were grouped together when performing the association testing. The potential heterogeneity may also contribute to the failure of replication of some novel loci even in ancestrally matched samples. Fourth, the associations of the novel loci and independent signals with lipids and related diseases need to be further explored using functional studies to gain a better understanding of the underlying mechanisms. Finally, using the European reference transcriptome may introduced bias in our PrediXcan analysis, emphasizing the need of data collection and better model construction in minority populations.

In summary, we identified nine novel loci, minority-specific independent signals at eight previously established loci, and one novel gene from the PrediXcan analysis, and observed different effect estimates of the associated variants across ancestral groups, which reinforce the need to conduct genetic association studies in participants of diverse ancestral background. The findings in these currently underrepresented populations will provide new insights into the genetics of lipids and associated diseases, thus paving the road to precision medicine.

Materials and Methods

Ethics statement

All studies were approved by local Institutional Review Boards and written informed consent was obtained from each participant. The Fred Hutch Institutional Review Board approved the study with protocol number of 8071.

Study population

The PAGE study incorporated 45,698 minority participants from ARIC, BioMe, CARDIA, MEC, HCHS/SOL, and WHI with available lipid measurements, primarily AAs, Hispanics, Asians, Native Hawaiians, and Native Americans (Table 1). In addition, a total of 22,887 European ancestry participants with lipid measurements from ARIC, BioMe, CARDIA and WHI were available in PAGE. Detailed characteristics of each study are provided in S1 Table, and detailed information for each study is presented in S1 Text.

Measurement of lipid levels

HDL-C, TC, and TG levels (mg/dL) in fasting blood were measured while LDL-C levels were calculated using the Friedewald Equation. LDL-C levels were not calculated if the corresponding TG levels were greater than 400mg/dL. Lipid levels were further adjusted for medication by adding a constant based on previous publications [24] (S15 Table). If multiple medications were used, only the largest constant was applied. Participants who were pregnant at blood draw or who had fasted less than 8 hours prior to blood draw were excluded from the analysis. TG levels after adjustment for medication were natural-log transformed. Summary statistics of lipid levels in all PAGE participants and the ancestry/study-stratified results are provided in S1 Table.

Genotyping, imputation and quality control

The PAGE minority populations were genotyped using two different strategies. A total of 33,063 participants with lipids measurements (10,085 AAs, 17,751 Hispanics and 2,378 Asians, 1,912 Native Hawaiians, 604 Native Americans and 333 others) were genotyped using the MEGA array, which was specifically designed to substantially increase variant coverage across multiple ethnic groups [15]. On the MEGA array, 1,705,969 genetic variants were genotyped. Quality control (QC) filters were applied at both the individual sample and the SNP level. At the individual sample level, samples with evidence of sex discrepancy, Mendelian inconsistency, unexpected duplication/non-duplication, poor performance, DNA mixture, identity issue or restricted consent were excluded. At the SNP level, SNPs meeting the following criteria were excluded: (1) failed the Center for Inherited Disease Research (CIDR) technical filters at John Hopkins University; (2) call rate <98%; (3) discordant calls in study duplicates; (4) >1 Mendelian errors in trio and duos; (5) Hardy-Weinberg P<1E-4; (6) sex difference in allele frequency ≥0.2 for autosomes/XY; (7) sex difference in heterozygosity >0.3 for autosomes/XY; (8) positional duplicates. SNPs that passed QC were further imputed to 1000 Genomes Phase 3 data using SHAPEIT2 and IMPUTE (version 2.3.2), resulting in 39,723,562 imputed SNPs with IMPUTE info score no less than 0.4.

A total of 7,556 AA, 5,079 Hispanic and 22,887 European ancestry participants with lipid measurements from ARIC, BioMe, CARDIA, MEC and WHI were previously genotyped using either Affymetrix or Illumina arrays within each individual study (S1 Text). While the QC filters vary slightly by study, similar criteria as those listed above were used including exclusion of: (1) low call rate <90%; (2) discordant calls in study duplicates; (3) >1 Mendelian errors in trio and duos; (4) Hardy-Weinberg P<1E-6; (5) sex difference in allele frequency; (7) positional duplicates; (8) ancestry outliers. The genotype data from these studies was imputed to the 1000 Genome Phase 3 panel using IMPUTE (version 2.3.2) in each study separately, and SNPs with info score less than 0.4 were excluded.

Statistical analyses

The association analysis in the discovery stage was divided into two stages, the minority meta-analysis and the minority plus European meta-analysis. In the minority meta-analysis, we combined the 33,063 participants genotyped on the MEGA array and the additional 7,556 AAs and 5,079 Hispanics genotyped on different arrays through fixed-effect inverse-variance-weighted meta-analysis in METAL [36]. Before the minority meta-analysis, all participants genotyped on the MEGA array were pooled together for association testing, with adjustment for age, sex, study, self-identified ethnicity as a proxy for cultural background, center, household membership, and the first 10 PCs. Lipid levels were inverse-normally transformed by sex. For the additional 7,556 AAs and 5,079 Hispanics, association analyses were performed in each study separately, with adjustment for age, sex and the first 10 PCs. Lipid levels were inverse-normally transformed in each study/genotyping array by sex. All association analyses were performed using SUGEN, which implements a generalized estimating equation (GEE) method and accounts for relatedness [37]. SNPs with effective sample size (effN, effN = 2×MAF×(1-MAF)×N×info, where MAF was the minor allele frequency, N was the sample size and info was the IMPUTE2 info score) less than 30 in MEGA, or less than 5 in the individual studies were excluded before meta-analysis. In the minority plus European ancestry meta-analysis, we combined 45,698 PAGE minority ancestry participants, 22,887 PAGE European ancestry participants, and publicly available association summary statistics from the ENGAGE Consortium through fixed-effect inverse-variance-weighted meta-analysis in METAL, reaching a total sample size of over 131,000. Sample-size-weighted meta-analyses were also performed; the results are shown in S16 Table. SNPs that were only available in one study were excluded after meta-analysis. Conditional analyses adjusting for previously established loci were performed to determine the independency of the novel loci and to explore residual signals as well. The previously established lipids-associated loci list was hand-curated integrating SNPs indexed in the GWAS Catalog (Access date: September 23, 2018) or identified through non-GWAS arrays with covered part of the genome (metabochip or exomechip) [5–7, 9]. Reported SNPs with P<0.05 in our meta-analysis on each chromosome were adjusted in the model to achieve an efficient conditional analysis. All conditional analyses in PAGE were performed using individual level data by SUGEN while conditional analysis in ENGAGE was performed using summary statistics by GCTA-COJO [38]. Since all participants in ENGAGE were of European descent, the LD matrix was estimated from 9,345 European ancestry participants from the ARIC study that were available in PAGE. Novel loci were defined as those that fulfilled all of the three criteria: (1) the lead SNP reached genome-wide significance (P<5.0E-8) in both the marginal and conditional analysis; (2) the lead SNP was located more than 500kb of any previously established loci; (3) the lead SNP had at least one neighboring SNP (within ±500kb) showing suggestive genome-wide significance (P<1.0E-5).

In the replication stage, summary statistics for the nine novel loci were extracted from MVP, the Hispanic GWAS meta-analysis, eMERGE, Kaiser, JHS and UKBB, respectively, in ancestry-stratified and ancestry-combined (if applicable) manners. A proxy variant (rs537734545, r² = 1) of the MTHFD2-rs186278890 was used in MVP. All replication studies used the similar trait transformation strategy as in the discovery studies except for the Kaiser study (HDL was square-root-transformed, TG was log-transformed, and LDL and TC were not transformed) [18], and sample-size-weighted meta-analyses were performed to combine the summary statistics from each study. Details of each replication study are presented in S1 Text.

When examining the potential heterogeneity of previous known loci in PAGE minority ancestry participants, interaction analyses were performed by including SNP×PC terms for all the first 10 PCs in the models. Models with and without the interaction terms were compared using the F-statistic, and an overall SNP×PC interaction P value (P_SNP×PC) for each known variant was estimated, which indicated whether the additional variance explained by the interaction terms was statistically significant and represented effect modification driven by genetic ancestry [20]. The P_SNP×PC values for all previously reported variants are shown in S7 Table.

The PRSs were constructed by combining the lipid trait-increasing allele counts of the associated variants (reported by GLGC [6]) weighted by the corresponding effect sizes of each allele. In PRS3, the most significant variants in PAGE at each GLGC reported locus (±500kb) were used. Full list of used variants in the PRSs are shown in S17 Table. Ten-fold cross validation was implemented to estimate the trait variance explained by PRS1, PRS2 and PRS3 using four linear models: (1) model 0 which included all covariates in the association analysis (age, sex, study, self-identified ethnicity, center, household membership and the first 10 PCs); (2) model 1 which included PRS1 in additional to all covariates in model 0; (3) model 2 which included PRS2 in addition to all covariates in model 0; (4) model 3 which included PRS3 in addition to all covariates in model 0. The residual values of the four models were used to determine whether there was improvement in estimating lipid levels using different scores.

Explained phenotypic variance for each genetic variant was calculated using the equation below [39]. Explained phenotypic variance = 2β2MAF(1−MAF)2β2MAF(1−MAF)+SE22NMAF(1−MAF)

In the fine-map analyses for previously GLGC reported loci [6] using FINEMAP [23], we first generated the posterior probability (PP) of each variant within ±1Mb regions of the top hits in GLGC, PAGE AA and PAGE Hispanic participants. Only variants that were available in both GLGC and PAGE minority data were included in the analysis. The number of causal variants in each region was set to one in FINEMAP. With a single number of casual variants in each region the results of FINEMAP do not depend on the reference LD. We then constructed the 99% credible set for each reported locus in GLGC, PAGE AA and PAGE Hispanic participants, respectively. The combined PP of each variant were calculated by multiplying PP from each ancestral group and rescaling it based on the sum of the PPs for all variants [(PP_i, AA × PP_i, HA × PP_i, GLGC) / sum_i(PP_i, AA × PP_i, HA × PP_i, GLGC), where i refers to the locus, and the AA and HA results are from PAGE]. We used this approach instead of meta-analyzing these three groups because the sample size of European ancestry population from GLGC (N = 188,577) is overwhelming compared to the ones of AA (N = 17,641) and Hispanic (N = 22,830) ancestry populations in PAGE. The meta-analysis results would be driven mainly by GLGC results, which in turn introduces bias in the fine-map analysis. The approach we used took into account both the association P value and the sample size of each group. In the fine-map analyses of the nine novel loci, the LD matrix was estimated using the ethnicity that drove the significance of the locus. In particular, LD estimates for 5q31 and DDHD1 were generated using all minority ancestry participants genotyped on the MEGA array (N = 51,520) and Native Hawaiian participants genotyped on the MEGA array (N = 3,944), respectively. LD estimations of HLF, B4GALNT3 and GPCPD1 were generated by combining LD information from all minority ancestry participants genotyped by MEGA array (N = 51,520) and all available European participants from ARIC in PAGE (N = 9,345) through a sample-size-weighted approach. For DLC1 and PCSK1 loci, which were driven by Hispanic signals, and ZCCHC6 and MTHFD2 loci, which were driven by AA signals, LD estimations were generated using all Hispanic (N = 22,244) and AA (N = 17,324) participants genotyped on the MEGA array, respectively.

Bioinformatic functional follow-up and PrediXcan analysis

Bioinformatic functional follow-up was performed for each novel locus using our comprehensive functional annotation database and a custom UCSC analysis data hub. In the PrediXcan analysis, we focused on adipose tissue, liver and whole blood, which are closely linked to lipid metabolism. There were 8,271, 6,594, 3,355 and 6,298 genes included in the models in subcutaneous adipose tissue, visceral adipose tissue, liver and whole blood, respectively, and genes with P<2.04E-6 [0.05/(8271+6594+3355+6298)] were considered as significant. Details are presented in S1 Text.

Supporting information

S1 Fig [a]
QQ plots of the meta-analyses.

S2 Fig [hg19]
Locuszoom plots for the nine novel loci.

S3 Fig [a]
Functional annotation of the nine novel loci.

S1 Table [xlsx]
Characteristics of study samples in PAGE MEGA and non-MEGA studies.

S2 Table [xlsx]
Study-specific and conditional results of the novel loci identified in the minority meta-analysis.

S3 Table [xlsx]
Ethnic-specific results of the novel loci identified in the minority meta-analysis.

S4 Table [xlsx]
Study-specific and conditional results of the novel loci identified in the minority plus European meta-analysis.

S5 Table [xlsx]
Replication of the nine novel loci.

S6 Table [xlsx]
Significant gene-lipid associations identified in the PrediXcan analysis.

S7 Table [xlsx]
Results of previously reported variants in PAGE minorities genotyped on MEGA.

S8 Table [xlsx]
Phenotypic variance explained by the GLGC reported variants.

S9 Table [xlsx]
Independent signals at previously established loci.

S10 Table [xlsx]
Fine-map of reported loci using AA-specific LD.

S11 Table [xlsx]
Fine-map of the GLGC reported loci using FINEMAP.

S12 Table [xlsx]
Fine-map of the nine novel loci using FINEMAP.

S13 Table [xlsx]
Performance of the PRSs for each lipid trait.

S14 Table [xlsx]
Novel loci annotation summary.

S15 Table [xlsx]
Medication adjustment of lipid levels.

S16 Table [xlsx]
Sample-size-weighted meta-analysis results for the nine novel loci.

S17 Table [xlsx]
SNPs used in building the 3 constructs of PRSs.

S1 Text [docx]
Supplementary methods.

Zdroje

1. Emerging Risk Factors C, Di Angelantonio E, Sarwar N, Perry P, Kaptoge S, Ray KK, et al. Major lipids, apolipoproteins, and risk of vascular disease. JAMA. 2009;302(18):1993–2000. Epub 2009/11/12. doi: 10.1001/jama.2009.1619 19903920; PubMed Central PMCID: PMC3284229.

2. Qi Q, Liang L, Doria A, Hu FB, Qi L. Genetic predisposition to dyslipidemia and type 2 diabetes risk in two prospective cohorts. Diabetes. 2012;61(3):745–52. Epub 2012/02/09. doi: 10.2337/db11-1254 22315312; PubMed Central PMCID: PMC3282815.

3. Oresic M, Hyotylainen T, Kotronen A, Gopalacharyulu P, Nygren H, Arola J, et al. Prediction of non-alcoholic fatty-liver disease and liver fat content by serum molecular lipids. Diabetologia. 2013;56(10):2266–74. Epub 2013/07/05. doi: 10.1007/s00125-013-2981-2 23824212; PubMed Central PMCID: PMC3764317.

4. Kettunen J, Tukiainen T, Sarin AP, Ortega-Alonso A, Tikkanen E, Lyytikainen LP, et al. Genome-wide association study identifies multiple loci influencing human serum metabolite levels. Nat Genet. 2012;44(3):269–76. Epub 2012/01/31. doi: 10.1038/ng.1073 22286219; PubMed Central PMCID: PMC3605033.

5. Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010;466(7307):707–13. Epub 2010/08/06. doi: 10.1038/nature09270 20686565; PubMed Central PMCID: PMC3039276.

6. Willer CJ, Schmidt EM, Sengupta S, Peloso GM, Gustafsson S, Kanoni S, et al. Discovery and refinement of loci associated with lipid levels. Nat Genet. 2013;45(11):1274–83. Epub 2013/10/08. doi: 10.1038/ng.2797 24097068; PubMed Central PMCID: PMC3838666.

7. Liu DJ, Peloso GM, Yu H, Butterworth AS, Wang X, Mahajan A, et al. Exome-wide association study of plasma lipids in >300,000 individuals. Nat Genet. 2017;49(12):1758–66. Epub 2017/10/31. doi: 10.1038/ng.3977 29083408; PubMed Central PMCID: PMC5709146.

8. Lu X, Peloso GM, Liu DJ, Wu Y, Zhang H, Zhou W, et al. Exome chip meta-analysis identifies novel loci and East Asian-specific coding variants that contribute to lipid levels and coronary artery disease. Nat Genet. 2017;49(12):1722–30. Epub 2017/10/31. doi: 10.1038/ng.3978 29083407; PubMed Central PMCID: PMC5899829.

9. Hoffmann TJ, Theusch E, Haldar T, Ranatunga DK, Jorgenson E, Medina MW, et al. A large electronic-health-record-based genome-wide study of serum lipids. Nat Genet. 2018;50(3):401–13. Epub 2018/03/07. doi: 10.1038/s41588-018-0064-5 29507422; PubMed Central PMCID: PMC5942247.

10. Bentley AR, Sung YJ, Brown MR, Winkler TW, Kraja AT, Ntalla I, et al. Multi-ancestry genome-wide gene-smoking interaction study of 387,272 individuals identifies new loci associated with serum lipids. Nat Genet. 2019;51(4):636–48. Epub 2019/03/31. doi: 10.1038/s41588-019-0378-y 30926973.

11. Deo RC, Reich D, Tandon A, Akylbekova E, Patterson N, Waliszewska A, et al. Genetic Differences between the Determinants of Lipid Profile Phenotypes in African and European Americans: The Jackson Heart Study. Plos Genetics. 2009;5(1). ARTN e1000342 10.1371/journal.pgen.1000342. WOS:000266221100024.

12. Bermudez OI, Velez-Carrasco W, Schaefer EJ, Tucker KL. Dietary and plasma lipid, lipoprotein, and apolipoprotein profiles among elderly Hispanics and non-Hispanics and their association with diabetes. American Journal of Clinical Nutrition. 2002;76(6):1214–21. WOS:000179414600005. doi: 10.1093/ajcn/76.6.1214 12450885

13. Klarin D, Damrauer SM, Cho K, Sun YV, Teslovich TM, Honerlaw J, et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program. Nat Genet. 2018. Epub 2018/10/03. doi: 10.1038/s41588-018-0222-9 30275531.

14. Matise TC, Ambite JL, Buyske S, Carlson CS, Cole SA, Crawford DC, et al. The Next PAGE in Understanding Complex Traits: Design for the Analysis of Population Architecture Using Genetics and Epidemiology (PAGE) Study. American Journal of Epidemiology. 2011;174(7):849–59. doi: 10.1093/aje/kwr160 WOS:000295166500011. 21836165

15. Bien SA, Wojcik GL, Zubair N, Gignoux CR, Martin AR, Kocarnik JM, et al. Strategies for Enriching Variant Coverage in Candidate Disease Loci on a Multiethnic Genotyping Array. Plos One. 2016;11(12):e0167758. Epub 2016/12/16. doi: 10.1371/journal.pone.0167758 27973554; PubMed Central PMCID: PMC5156387.

16. Surakka I, Horikoshi M, Magi R, Sarin AP, Mahajan A, Lagou V, et al. The impact of low-frequency and rare variants on lipid levels. Nat Genet. 2015;47(6):589–97. Epub 2015/05/12. doi: 10.1038/ng.3300 25961943; PubMed Central PMCID: PMC4757735.

17. Stanaway IB, Hall TO, Rosenthal EA, Palmer M, Naranbhai V, Knevel R, et al. The eMERGE genotype set of 83,717 subjects imputed to ~40 million variants genome wide and association with the herpes zoster medical record phenotype. Genet Epidemiol. 2019;43(1):63–81. Epub 2018/10/10. doi: 10.1002/gepi.22167 30298529.

18. Hoffmann TJ, Theusch E, Haldar T, Ranatunga DK, Jorgenson E, Medina MW, et al. A large electronic-health-record-based genome-wide study of serum lipids. Nature Genetics. 2018;50(3):401–+. doi: 10.1038/s41588-018-0064-5 WOS:000427933400016. 29507422

19. Taylor HA, Wilson JG, Jones DW, Sarpong DF, Srinivasan A, Garrison RJ, et al. Toward resolution of cardiovascular health disparities in African Americans: Design and methods of the Jackson Heart Study. Ethnic Dis. 2005;15(4):S4–S17. WOS:000233440100002.

20. Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J, Gignoux CR, et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570(7762):514–8. Epub 2019/06/21. doi: 10.1038/s41586-019-1310-4 31217584.

21. Leipold E, Liebmann L, Korenke GC, Heinrich T, Giesselmann S, Baets J, et al. A de novo gain-of-function mutation in SCN11A causes loss of pain perception. Nat Genet. 2013;45(11):1399–404. Epub 2013/09/17. doi: 10.1038/ng.2767 24036948.

22. Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles DA, Golan D, et al. Opportunities and challenges for transcriptome-wide association studies. Nat Genet. 2019;51(4):592–9. Epub 2019/03/31. doi: 10.1038/s41588-019-0385-z 30926968.

23. Benner C, Spencer CCA, Havulinna AS, Salomaa V, Ripatti S, Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32(10):1493–501. doi: 10.1093/bioinformatics/btw018 WOS:000376656900008. 26773131

24. Zubair N, Graff M, Ambite JL, Bush WS, Kichaev G, Lu YC, et al. Fine-mapping of lipid regions in global populations discovers ethnic-specific signals and refines previously identified lipid loci. Human Molecular Genetics. 2016;25(24):5500–12. doi: 10.1093/hmg/ddw358 WOS:000397063900019. 28426890

25. Davis JP, Vadlamudi S, Roman TS, Zeynalzadeh M, Iyengar AK, Mohlke KL. Enhancer deletion and allelic effects define a regulatory molecular mechanism at the VLDLR cholesterol GWAS locus. Hum Mol Genet. 2018. Epub 2018/11/18. doi: 10.1093/hmg/ddy385 30445632.

26. Liu X, White S, Peng B, Johnson AD, Brody JA, Li AH, et al. WGSA: an annotation pipeline for human genome sequencing studies. J Med Genet. 2016;53(2):111–2. Epub 2015/09/24. doi: 10.1136/jmedgenet-2015-103423 26395054; PubMed Central PMCID: PMC5124490.

27. Quang D, Chen YF, Xie XH. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics. 2015;31(5):761–3. doi: 10.1093/bioinformatics/btu703 WOS:000352268500019. 25338716

28. Dzitoyeva S, Manev H. Reduction of Cellular Lipid Content by a Knockdown of Drosophila PDP1 gamma and Mammalian Hepatic Leukemia Factor. J Lipids. 2013;2013:297932. Epub 2013/09/26. doi: 10.1155/2013/297932 24062952; PubMed Central PMCID: PMC3766575.

29. Ionita-Laza I, McCallum K, Xu B, Buxbaum JD. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nature Genetics. 2016;48(2):214–20. doi: 10.1038/ng.3477 WOS:000369043900021. 26727659

30. Gamazon ER, Segre AV, van de Bunt M, Wen XQ, Xi HS, Hormozdiari F, et al. Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nature Genetics. 2018;50(7):956–+. doi: 10.1038/s41588-018-0154-4 WOS:000437224400011. 29955180

31. Nelson CP, Goel A, Butterworth AS, Kanoni S, Webb TR, Marouli E, et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nature Genetics. 2017;49(9):1385–+. doi: 10.1038/ng.3913 WOS:000408672000017. 28714975

32. Holmen OL, Zhang H, Fan YB, Hovelson DH, Schmidt EM, Zhou W, et al. Systematic evaluation of coding variation identifies a candidate causal variant in TM6SF2 influencing total cholesterol and myocardial infarction risk. Nature Genetics. 2014;46(4):345–+. doi: 10.1038/ng.2926 WOS:000334510100009. 24633158

33. Ma J, Dempsey AA, Stamatiou D, Marshall KW, Liew CC. Identifying leukocyte gene expression patterns associated with plasma lipid levels in human subjects. Atherosclerosis. 2007;191(1):63–72. doi: 10.1016/j.atherosclerosis.2006.05.032 WOS:000244944100008. 16806233

34. Hussain MR, Hoessli DC, Fang M. N-acetylgalactosaminyltransferases in cancer. Oncotarget. 2016;7(33):54067–81. Epub 2016/06/21. doi: 10.18632/oncotarget.10042 27322213; PubMed Central PMCID: PMC5288242.

35. Stewart JD, Marchan R, Lesjak MS, Lambert J, Hergenroeder R, Ellis JK, et al. Choline-releasing glycerophosphodiesterase EDI3 drives tumor cell migration and metastasis. Proc Natl Acad Sci U S A. 2012;109(21):8155–60. Epub 2012/05/10. doi: 10.1073/pnas.1117654109 22570503; PubMed Central PMCID: PMC3361409.

36. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–1. Epub 2010/07/10. doi: 10.1093/bioinformatics/btq340 20616382; PubMed Central PMCID: PMC2922887.

37. Lin DY, Tao R, Kalsbeek WD, Zeng D, Gonzalez F, 2nd, Fernandez-Rhodes L, et al. Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos. Am J Hum Genet. 2014;95(6):675–88. Epub 2014/12/07. doi: 10.1016/j.ajhg.2014.11.005 25480034; PubMed Central PMCID: PMC4259979.

38. Yang J, Ferreira T, Morris AP, Medland SE, Genetic Investigation of ATC, Replication DIG, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet. 2012;44(4):369–75, S1-3. Epub 2012/03/20. doi: 10.1038/ng.2213 22426310; PubMed Central PMCID: PMC3593158.

39. Shim H, Chasman DI, Smith JD, Mora S, Ridker PM, Nickerson DA, et al. A Multivariate Genome-Wide Association Analysis of 10 LDL Subfractions, and Their Response to Statin Treatment, in 1868 Caucasians. Plos One. 2015;10(4). UNSP e012075810.1371/journal.pone.0120758. WOS:000353212600006.

Článek Bayesian network analysis incorporating genetic anchors complements conventional Mendelian randomization approaches for exploratory analysis of causal relationships in complex data

Článek Disentangling group specific QTL allele effects from genetic background epistasis using admixed individuals in GWAS: An application to maize flowering

Článek Drosophila insulin-like peptide 2 mediates dietary regulation of sleep intensity

Článek The alarmones (p)ppGpp are part of the heat shock response of Bacillus subtilis

Článek RNA Polymerase II CTD phosphatase Rtr1 fine-tunes transcription termination

Článek Modeling cancer genomic data in yeast reveals selection against ATM function during tumorigenesis

Článek The Caenorhabditis elegans homolog of the Evi1 proto-oncogene, egl-43, coordinates G1 cell cycle arrest with pro-invasive gene expression during anchor cell invasion

Článek Transcription-replication conflicts as a source of common fragile site instability caused by BMI1-RNF2 deficiency

Článek The Lid/KDM5 histone demethylase complex activates a critical effector of the oocyte-to-zygote transition

Článek Tracking human population structure through time from whole genome sequences

Článek FLS2 is a CDK-like kinase that directly binds IFT70 and is required for proper ciliary disassembly in Chlamydomonas

Článek Cell cycle transcriptomics of Capsaspora provides insights into the evolution of cyclin-CDK machinery