Integration of Genome-Wide SNP Data and Gene-Expression Profiles Reveals Six Novel Loci and Regulatory Mechanisms for Amino Acids and Acylcarnitines in Whole Blood

Download PDF České info

Human metabolite levels differ between individuals due to environmental and genetic factors. In the present work, we analyzed whole blood levels of amino acids and acylcarnitines, reflecting disease relevant metabolic pathways, in a cohort of 2,107 individuals. We then performed a genome wide association analysis to discover genetic variants influencing metabolism. Thereby, we discovered six novel regions in the genome and confirmed ten regions previously found to be associated with metabolites in plasma, serum or urine. Subsequently, we analyzed whether these variants regulate gene-expression in peripheral mononuclear cells and at several loci we identified novel causal relations between SNPs, gene-expression and metabolite levels. These findings help explaining the functional mechanisms by which associated genetic variants regulate metabolism. Finally, several SNPs associated with blood metabolites in our study overlap with previously identified loci for human diseases (e.g. kidney disease), suggesting a shared genetic basis or pathomechanisms involving metabolic alterations. The identified loci are strong candidates for future functional studies directed to understand human metabolism and pathogenesis of related diseases.

Published in the journal: . PLoS Genet 11(9): e32767. doi:10.1371/journal.pgen.1005510
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1005510

Summary

Introduction

High-throughput metabolomics experiments using mass spectrometry platforms are becoming an integral part of clinical and systems biology research. Profiling of amino acids and acylcarnitine species in dried whole blood samples of newborns is used worldwide in neonatal screening programs to identify rare inborn errors of metabolism [1]. These diseases are generally caused by rare mutations, leading to loss of function of an enzyme that catalyzes the biochemical reaction of the respective trait. Recently, many of the amino acid and fatty acid metabolites utilized in newborn screening were also implicated in common complex diseases of adults such as cardiovascular disease, insulin resistance and obesity. Exemplarily, obesity is accompanied by an increase in circulating levels of multiple amino acids, including branched chain amino acids [2,3], and in type 2 diabetics, altered levels of acylcarnitines were described [4,5]. Amino acids and acylcarnitines show substantial inter-individual variation [6] and a strong genetic contribution to their blood concentrations has been reported [7]. Thus, the integration of genetic and metabolic profiling holds the promise for providing novel insights into the regulation of metabolic homeostasis in health and disease.

Indeed, recent studies have identified common genetic variants associated with a variety of circulating metabolites in serum, plasma or urine using different analytical platforms (LC-MS/MS, NMR) [8–24]. However, the complexity of the metabolome cannot be captured by a single technology. Since differences in metabolite abundance have been described between plasma and whole blood [25], we hypothesized that additional genetic determinants affecting the blood metabolome are yet to be discovered.

Thus, we performed an integrated study combining genetics, gene expression and metabolom data (see S1 Fig for the study design). We applied a targeted LC-MS/MS method to measure the abundance of amino acids and acylcarnitines in dried whole blood spots of 2,107 individuals and performed genome-wide association analysis. Top findings were replicated in a second independent European Caucasian cohort of 923 Sorbs. Further, going beyond plain genetic associations, we integrated analyses of mRNA levels in leukocytes to establish causal links between genetic variations, gene-expression levels and metabolites. Finally, we explored whether SNP-metabolite associations identified in our study overlap with previously identified genetic loci for other complex traits or diseases.

Results

Discovery GWAS

Quantitative concentrations of 26 amino acids, 36 acylcarnitines and 34 metabolite ratios were determined in dried whole blood spots of 2,107 participants of the LIFE Leipzig Heart Study using LC-MS/MS. Metabolites and their ratios reflect metabolic function of various biochemical pathways e.g. urea cycle, branched chain amino acid metabolism or cellular fatty acid oxidation (see S1 Table for complete list of phenotypes and their categories). We performed a genome wide association study (2,619,023 SNPs) for whole blood metabolites and identified 2,261 SNP-metabolite associations (119 after pruning) with p-values <10^-7. These associations comprise 42 metabolites (including 19 ratios) and 866 SNPs (54 lead-SNPs after pruning) at 25 unique genomic locations (Fig 1, S2 Table). QQ-plots and regional association plots for all loci demonstrating valid quality control are presented in the supplemental material (S2 and S3 Figs).

**Fig. 1. GWAS results for amino acids (a) and acylcarnitines (b) in whole blood.**

Replication analysis

Next, replication of top SNPs was sought in an independent cohort of 923 individuals from the Sorb study, where genome-wide SNP and metabolite datasets were available. Good proxies (r²>0.8) for replication analysis in the Sorbs were available for 858 (99.1%) of our 866 top-SNPs, covering 21 of the 25 identified loci and comprising 2,227 associations (well-imputed proxies were not available for the loci at 1q32.3, 3p24.1, 5p15.2, 20q13.2, see S3 Table for complete results). We observed identical directions of effects for 2,133 (95.8%) combinations of SNPs and metabolites in the replication cohort, resulting in a replication rate of 88.3%, when applying a FDR (false discovery rate) of 5% (Fig 2). Replicated lead-SNPs were distributed over 14 of the 21 genomic loci eligible for replication analysis (Table 1; see S3 Table for results of non-replicated loci). In addition, we considered associations at locus #4 (2q34) with glycine and locus #14 (12q24.31) with C4 as validated results, since these loci were already reported in other GWAS for serum metabolites [8,9,13–15]. Moreover, non-lead-SNPs at 12q24.31 were replicated in the Sorbs at FDR 5% level. None of the other non-replicated loci or loci without proxies in the Sorb study achieved a p-value <10⁻⁸ in our initial GWAS.

**Tab. 1. Results of SNP-metabolite association analyses.**

**Fig. 2. Results of replication analysis.**

In total, our study led to the identification of 16 unique, validated loci for 36 whole blood metabolites (Table 1). At six of the 16 loci we identified associations for blood metabolites for the first time i.e. these loci represent novel findings of our study. Also, we successfully validated ten loci previously reported for serum, plasma, and urine metabolites (Table 1 and S4 Table). At three of these loci, associated metabolites were different from those previously reported. In detail, at locus #3 (2p13.1) we detected associations with Arg and related metabolite ratios, whereas earlier associations were reported for plasma N-acetylornithine and related compounds [8,13,14,16]. Further, at loci #11 (9q34.11) and #15 (15q22.2), we identified associations with methylmalonyl-carnitine, whereas earlier studies reported associations involving the isobaric compound succinyl-carnitine [13,14].

eQTL analysis

To investigate if associated variants have gene regulatory effects, we analyzed our validated lead-SNPs for correlations with gene expression in peripheral blood mononuclear cells (PBMC). Transcriptome data (28,295 eligible transcripts) was available for 2,112 subjects of the LIFE Leipzig Heart study. At an FDR of 5%, 132 eQTLs were identified for 38 of the 45 validated lead-SNPs, affecting the expression of 69 transcripts. Explained variances of eQTLs ranged between 0.4% (corresponding p-value = 3.9x10^-3) and 28.0% (corresponding p-value = 8.0x10^-153, S5 Table).

We observed eQTLs at 14 of the 16 validated loci, including the six novel loci identified in our study (Fig 3 and S7 Fig, Table 2). All 14 loci included lead-SNPs with cis-regulatory effects on gene expression. In addition, novel loci #2 (1q44) and #12 (19q11), as well as reported locus #14 (12q24) also included trans-regulated eQTLs. The trans-eQTLs at locus #2 (1q44) regulating JAM3 expression were inter-chromosomal and particularly strong, explaining about 13.0% of variance (Fig 3 and S7 Fig, Table 2).

**Tab. 2. Results of eQTL analysis of validated loci.**

Integrative analysis of mQTLs, eQTLs and expression-metabolite associations

We next aimed to assess whether changes in expression of identified eQTL genes can explain observed SNP-metabolite associations in our study. Therefore, we analyzed the relationship between expression levels of these genes and metabolites. We found 40 study-wide significant associations between gene expressions and metabolites, corresponding to 9 loci and 18 eQTL transcripts (16 unique genes, see Table 3 and S6 Table).

**Tab. 3. Results of associations between gene-expressions and metabolites.**

We then integrated information from SNP-metabolite (mQTL), SNP-gene expression (eQTL) and expression-metabolite associations to form association triangles. A triangle is defined by a triple of SNP, transcript and metabolite showing pair-wise associations (see methods for details). We constructed a network of all pairs of associations and their strengths (see Fig 4) to illustrate the multiple relationships between associated genetic loci, genes and metabolites. An interactive html-document to explore the network is provided as supplement material (S4 Fig). Certain overlaps with previously reported molecular interactions exist. These known relationships are summarized in S11 Table. We identified 177 relations containing 21 unique primary associations between features analysed in our study. Additionally, we identified 16 unique molecules potentially connecting features analysed in our study. As expected, these molecules include Proinsulin and Ubiquitin.

**Fig. 4. Network of discovered loci, eQTLs and metabolites.**

Association triangles were further used to test whether variances in gene expression are causally related to variances of metabolite levels. We discovered 38 association triangles mapping to six unique loci including the two novel loci #2 and #10 at 1q44 and 8q24.3, respectively (S7 Table). To estimate the number of such triangles identified by chance, we performed a comprehensive permutation analysis including mQTL, eQTL and expression-metabolite association analysis (S8 Fig). From this, the empirical likelihood of the reported six triangles obtained by chance was estimated to be <1x10^-15. Particularly, in only two of 100 permutations we obtained a single triangle while in 98 of our 100 permutations, no triangles were observed.

Next, we used Mendelian randomization to establish a causal link between gene expression and the metabolite. We identified 15 metabolite-gene pairs included in 36 triangles (S7 Table). Next, we investigated whether identified eQTLs explained a significant part of the SNP-metabolite association which we could demonstrate for a total of five loci (Table 4). Strongest causal effects were found for novel locus #10 at 8q24.3 associated with several Aspartic acid traits (strongest causal effect for ratio Aspartic acid / Acetylcarnitine via cis-regulation of PPP1R16A) and locus #11 at 9q34.11 associated with MMA via PPP2R4.

**Tab. 4. Integrative analysis and association triangles.**

Associations with clinical traits and diseases

Finally, we explored whether SNP-metabolite associations identified in our study overlap with genetic loci for clinically relevant traits published in the National Human Genome Research Institute (NHGRI) GWAS Catalog. At nine of the 16 validated loci, metabolite associated SNPs matched SNPs previously associated with clinical traits or diseases (S9 Table). We observed associations with platelet and red blood cell properties at three loci associated with acylcarnitines in our study (1q44 (C18), 10q11 (C3) and 15q22 (MMA)) [26–28]. Further, we found that several of our variants were associated with clinical chemistry traits, e.g. fibrinogen (2q34) [29], homocysteine (2q34) [30] and traits reflecting lipid metabolism (HDL-cholesterol at 2q34 and 15q22) [31], purine catabolism (uric acid at 10q21) [32], and kidney function (creatinine at 2p13 and 2q34) [33]. At the 2p13 and 2q34 loci, reported associations for creatinine were also linked to chronic kidney disease [34]. In addition, variants at the 2q34 locus for glycine also convey risk for non-small cell lung cancer [35]. Interestingly, recent studies described a key role for glycine in cancer cell proliferation and tumorigenesis [36,37]. Further, metabolite associations at 3q27 (C5OH+HMG), 5q31 (AC-total), 9q34 (MMA) and 15q22 (MMA) overlapped with associations for Parkinson’s Disease [38], Asthma [39], Hypersomnia [40] and orofacial cleft [41], respectively. These co-localizations may implicate a shared genetic basis (pleiotropy) between complex traits and aid in forming new hypothesis regarding molecular pathomechanisms.

Discussion

Several GWAS for urine, serum and plasma metabolites have been published using different measurement approaches [8,9,12–21,23]. Here, we report the first genome-wide association study for amino acid and acylcarnitine levels in whole blood. We discovered 25 loci of which 14 were replicated in an independent cohort. Additional two loci were strongly supported by mQTL-studies in serum or plasma [8,9,13–15]. Of these 16 loci, six describe novel SNP metabolite associations, comprising four loci associated with various acylcarnitines and two loci associated with amino acids. Our results demonstrate that studying whole blood can provide additional genetic loci not detected in previous mQTL studies for plasma or other body fluids. This might be attributable to differences in metabolite abundance and components of cellular metabolism not present in plasma (or other cell free body fluids) [25].

Further, we used whole genome expression in peripheral mononuclear cells to establish functional links between SNP-metabolite associations and gene-expression. EQTLs were discovered at 14 loci, including all of our six novel loci. Since eQTL analysis per se does not allow inferring causal genes, we performed gene expression association analysis between eQTL genes and metabolites associated with the corresponding SNP. This is a major advantage of our study since we can directly infer causal relationships, whereas most other studies can only report indirect evidence from public eQTL data bases. Besides limitations of gene-expression analysis, such as tissue specificity and numerous other ways for genetic variations to influence the function or abundance of proteins, we identified five loci for which a significant part of SNP-metabolite association was explained by blood eQTLs. These represent novel findings to the best of our knowledge and extend the very few examples of known causal chains between SNPs, gene-expression and metabolites [14,42,43].

Characteristics and functional hypotheses of novel loci

At two of the six newly identified loci (6q23, ARG1 and 21q22, HLCS), rare variants are known to cause autosomal recessive inborn errors of metabolism, providing a strong biological plausibility for the SNP-metabolite associations. Mutations in ARG1 (6q23), encoding arginase, the enzyme which catalyzes the hydrolysis of arginine, are the cause of Argininemia (OMIM #207800). Here, we report common variants of ARG1 to be associated with arginine levels. Likewise, defects in HLCS (21q22) are responsible for holocarboxylase synthetase deficiency (OMIM #253270) with affected individuals displaying elevated levels of C5OH+HMG. In line with this observation, the lead SNP at the HLCS locus exhibited a strong cis-eQTL and the allele responsible for higher HLCS expression was associated with lower C5OH+HMG levels.

A third novel locus (#8; 6q21) associated with multiple acylcarnitines (lead phenotype: AC-total) also contained a gene with direct biochemical relationship to the associated metabolites, namely SLC22A16, encoding an organic cation/ carnitine transporter. Gene expression of SLC22A16 was regulated in cis at this locus, but SLC22A16 gene expression was not correlated with acyl-carnitine concentrations in whole blood. In fact, the strongest SNP metabolite association at this locus was observed for a non-synonymous coding SNP (rs12210538) in SLC22A16, which is predicted to be damaging by Polyphen and SIFT [44,45]. These findings suggest that associations at 6q21 are more likely driven by this non-synonymous coding mutation than by gene expression of SLC22A16.

The remaining three novel loci relate to candidate genes with no prior connection to metabolism to the best of our knowledge. For the locus at 10q11.21, associated with C2 and C3, we observed cis-effects on ANUBL1 and FAM21C expression, but gene expressions of both transcripts were not correlated with either C2 or C3. Thus, additional work will be required to explore the causal link between genetic variation at the 10q11.21 locus and C2 and C3 blood concentrations.

At novel locus 8q24.3, integration of SNP, eQTL and gene-expression data let to the identification of PPP1R16 as putative causal gene for the association with aspartic acid and corresponding ratios (lead phenotype: alanine / aspartic acid). While we detected strong cis-effects on expression of two local genes, PPP1R16A and LRRC14, only the eQTL of PPP1R16A partly explained the observed SNP-phenotype associations. Future studies need to address how PPP1R16A, a gene involved in signal transduction [46], may be affecting blood levels of aspartic acid.

Finally, we identified JAM3 encoding the junctional adhesion molecule C (JAM-C) as a novel candidate gene of acylcarnitine metabolism. Top associated SNP rs3811444 (1q44) exhibited an exceptionally strong trans-eQTL for JAM3, located at 11q25. This trans effect was also described by other eQTL studies [47]. Gene expression of JAM3 correlated with several long chain acyl-carnitines (i.e. C16) and explained a significant part of the SNP-metabolite association. JAM-C participates in cell-cell adhesion, leukocyte transmigration and platelet activation. The soluble form of JAM-C has been shown to mediate angiogenesis [48]. Homozygous mutations in JAM3 cause hemorrhagic destruction of the brain, subependymal calcification, and congenital cataracts (HDBSCC, OMIM #613730). At present, the potential functional role of JAM3 in acyl-carnitine metabolism remains elusive.

Novel evidence at known metabolite loci

In addition to the identification of novel loci, we replicated and extended functional evidence for SNP-metabolite associations at ten loci previously described in GWAS for serum or plasma metabolites (Table 1). The majority of these loci contain highly plausible candidate genes based on their biologic function in metabolism (MCCC1, ETFDH, SLC22A4/5, ACADM, ACADS, CPS1, CRAT). Rare loss of function mutations in these genes cause Mendelian inborn errors of metabolism and measuring the respective marker metabolites in whole blood spots is part of neonatal screening programs throughout the world [1]. Here, we validated common variants located in non-coding DNA with modest effect sizes on blood metabolites. Additionally, we found blood eQTLs for MCCC1, ETFDH, SLC22A4/5, ACADM, and CRAT. This is in line with evidence from other complex genetic traits, demonstrating that most associations for common variants arise in non-coding DNA and emphasizes the importance of regulatory variants in modulating gene expression [49,50]. A striking example is the ACADM locus, where SNPs have been associated with C8 and C10 levels [13,14,20,21]. In our study, gene-expression of ACADM was associated with C8 and C10 blood levels and we showed for the first time that this relationship was causal explaining a part of the observed SNP association.

In conclusion, our study expanded the current knowledge on the genetic regulation of human blood metabolites by adding six novel genetic loci. Furthermore, by integrative analysis of SNP, gene expression and metabolite data, we derived mechanistic insights into the molecular regulation of blood metabolites. At several loci, we provide evidence for metabolite regulation via gene-expression and observed overlaps with GWAS loci for other complex traits and diseases, pointing towards potential pathomechanisms via metabolic alterations. Additional functional studies are required to elucidate the cellular mechanisms how the discovered candidate genes affect metabolic pathways and relate to disease pathology.

Materials and Methods

Cohorts

LIFE Leipzig Heart is an observational study in a Central European population designed to analyze genetic and non-genetic risk factors of atherosclerosis and related vascular and metabolic phenotypes [51]. Patients undergoing first-time diagnostic coronary angiography due to suspected stable CAD with previously untreated coronary arteries, patients with stable left main coronary artery disease and patients with acute myocardial infarction were recruited. The latter were excluded for the present analysis.

The study meets the ethical standards of the Declaration of Helsinki. It has been approved by the Ethics Committee of the Medical Faculty of the University of Leipzig, Germany (Reg. No 276–2005) and is registered at ClinicalTrials.gov (NCT00497887). Written informed consent including agreement with genetic analyses was obtained from all participants. In this analysis, we considered a total of 2,464 individuals. From these, 2,107 had complete genotype, metabolite and covariate data qualifying them for GWAS analysis (descriptive statistics can be found in S9 Table). A subset of 1,856 individuals had complete data of genotypes, gene expression, metabolites and covariates. These individuals were used for integrative analyses (see study design, S1 Fig).

The Sorbs were recruited from the self-contained Sorbs population in Germany [52–54]. All individuals were at fasting state. Phenotyping included standardized questionnaires for past medical history and family history, collection of anthropometric data (weight, height, waist-to-hip ratio) and results from an oral glucose tolerance test. A complete set of high-quality genotype data, metabolites and covariates was available for 923 subjects (S9 Table). The study was approved by the ethics committee of the University of Leipzig and all subjects gave written informed consent before taking part in the study.

Study design

An overview of the study design is presented in S1 Fig. In brief, we first performed a genome-wide metabolite quantitative trait (mQTL) analysis in the LIFE Leipzig Heart cohort, with replication of the top-SNPs in the Sorbs cohort. Following this two-stage design, we applied a liberal cut-off of 1.0x10^-7 for the initial GWAS to identify candidate loci. A stringent cut-off is applied at the replication stage where we control the (study-wide) FDR at 5% based on permutation analysis [55]. This accounts for the correlation structure of individuals, SNPs and metabolites and the multiple testing issue (for details see below section “Genome-wide association analysis and SNP replication”).

Functional relevance of identified loci was studied in the LIFE Leipzig Heart cohort by analyzing expression quantitative traits (eQTL) and gene expression-metabolite associations followed by causal inference regarding discovered associations.

Metabolomic analysis and data processing

Venous blood samples were obtained from all study participants and 40μl of native EDTA whole blood were spotted on filter paper WS 903 (Schleicher and Schüll, Germany) in the LIFE Leipzig Heart study. In the Sorb cohort, 40μl cell suspension obtained after plasma centrifugation (10 min at 3500 x g) were spotted on filter paper. All blood spots were stored at -80°C after 3 hours of drying until mass spectrometric analysis. Sample pretreatment and measurement is described elsewhere [56–58]. In brief, 3.0 mm diameter dried blood spot punches (containing 3 μL whole blood) were extracted with methanol containing isotope labelled standards. After sample extraction and derivatization, analysis was performed on an API 2000 tandem mass spectrometer (Applied Biosystems, Germany). Quantification of 26 amino acids, free carnitine and 34 acylcarnitines including related metabolites was performed using ChemoView 1.4.2 software (Applied Biosystems, Germany). Samples were analysed within 23 analytical batches with two quality controls samples in each batch. Mean inter-assay coefficients of variation were below 11% for amino acids and below 19% for acylcarnitines. Further, using these 61 directly measured analytes, we derived a number of biologically relevant sums (n = 1, total acylcarnitine) and ratios (n = 34) to assess reaction equilibria within physiological pathways and processes (e.g. Fischer’s ratio [59]). Consequently, a total of 96 quantities were analyzed as GWAS traits. A list of metabolites and quantities is presented in S1 Table.

Metabolites with more than 20 percent of values below detection limit were dichotomized for analysis (below detection limit versus above detection limit). This applies for the metabolites C5 : 1, C6DC, C14OH, C16OH, MeGlut, C18 : 1OH, C18 : 2OH, C18OH and C20 : 3. Quantities were arsinh-transformed (area sinus hyperbolicus) which is close to a log-transformation for large values but does not emphasize differences between small values and can operate on values of zero. Transformed quantities were approximately normal distributed. Values outside of the Interval Mean ± 5*SD were considered as outliers and were removed to stabilize subsequent regression analysis.

We previously analysed a variety of factors influencing blood metabolites. Age, sex, diabetes and fasting status show pronounced effects on several metabolites while log-BMI, smoking and some blood traits showed effects on selected metabolites. Therefore, we decided to adjust our analyses for these potential confounders.

SNP genotyping and quality control

LIFE leipzig heart samples

DNA was extracted from peripheral blood using the Invisorb Spin Blood Maxi Kit (Stratec) as described elsewhere [60]. Samples where genotyped using an Affymetrix Axiom SNP array with custom option comprising a total of 624,908 SNPs. The Axiom CEU array served as a backbone of our custom array. In addition 62,471 autosomal SNPs were placed on the array corresponding to 44 genomic regions previously associated with cardiovascular disease and metabolic risk factors, in particular plasma lipids. Genotyping was performed at Affymetrix (Santa Clara, Ca; USA). 2,925 out of 3,036 DNA samples were successfully genotyped and were called in combination by Affymetrix Power Tools version 1.12. As further sample-wise quality control, we filtered individuals with sex mismatch, call rate<97%, low or high mean squared difference of individual’s genotype and expected genotype according to box plot outlier criteria, duplicates, implausible relatedness according to Wang et al. [61] and outliers of principal components analysis (6SD criterion of EIGENSTRAT [62]). Thereafter, a total of N = 2,838 individuals remained for analysis. After the final step of sample quality control, the population genetic structure was homogenous (see S5 Fig). Based on this sample, we determined our SNP quality filter as follows: non-autosomal SNPs, minimal plate-wise call rate <90%, i.e. the minimum of the SNP call rate over all plates (our criteria implies that the conventional overall SNP call rate is greater than 94.3% and its 10^th percentile is greater than 99.2%), p-value of asymptotic Hardy-Weinberg equilibrium test <1.0x10^-6, p-value of the association of SNP allele frequency with plate number <1.0x10^-7. A total of 566,359 SNPs passed all criteria.

Genotype imputation was performed using IMPUTE v2.1.2 (http://mathgen.stats.ox.ac.uk/impute/impute_v2.html). HapMap2 CEU, Release 24, dbSNP-build 126, NCBI built 36 served as reference panel comprising a total of 3,974,237 autosomal SNPs. 555,911 of our measured SNPs were successfully mapped to the reference. As post-imputation quality control we discarded all SNPs with minor allele frequency ≤1% or with IMPUTE-info score ≤ 0.3. According to these criteria, a total of 2,619,023 SNPs were analysed.

Sorbs samples

Subjects were either genotyped using the 500K Affymetrix GeneChip or Affymetrix Genome-Wide Human SNP Array 6.0. For genotype calling, BRLMM algorithm (Affymetrix, Inc) was applied for 500K and Birdseed algorithm for Genome-Wide Human SNP Array 6.0. Details of genotyping have been described elsewhere [53]. Quality control of samples was performed as described in Gross et al [52] resulting in N = 977 individuals with genotypes of good quality (N = 483 genotyped with 500K, N = 494 genotyped with 6.0). Three ethnic outliers were identified by a ‘drop one in’ procedure to avoid bias by the relatedness structure within the Sorbs (see [63] for details). These samples were excluded from subsequent analyses. After removing these samples, principal components revealed a homogenous population structure (see S6 Fig). To account for relatedness, a drop-one-in procedure was used for principal components analysis (see [63] for details).

Genotype imputation was performed without prior SNP filtering and separately for individuals genotyped with 500K and 6.0 respectively as described [63]. The same software and reference panel was used as for the LIFE Leipzig Heart samples.

Genome-wide association analysis and SNP replication

Genome-wide association analyses for blood 96 metabolites was performed in the LIFE Leipzig Heart samples (N = 2,107 with complete phenotypes, covariates and high-quality genotypes). Associations were tested by linear regression models using gene-doses of imputed SNPs. We adjusted for age, sex, log-BMI, diabetes status, smoking status, fasting status, haematocrit, platelet count, white blood cell count and the first three genetic principal components. Results revealed no signs of genomic inflation (maximum lambda equal 1.018, see S10 Table). To avoid reporting of redundant SNP information, the top-SNP list was ordered according to minimal p-values and pruned applying a linkage disequilibrium cut-off of r²<0.3.

Replication analysis was performed in the independent cohort of Sorbs (N = 923 with complete genotype and metabolite data) and for all combinations of SNPs and metabolites achieving a p-value of <10⁻⁷ in our first stage GWAS. Based on our unpruned GWAS top-list, we retrieved all SNPs within a ±50kB environment which were successfully imputed in the Sorbs (IMPUTE-info score>0.3 in both, 500K and 6.0 subsample). Then, on the basis of the LIFE Leipzig Heart data, we assessed which of these SNPs are the best proxies of the corresponding top-SNPs to pair GWAS top-SNPs with optimal proxies of good quality within the Sorbs study.

Associations between pairs of proxies and metabolites were again analyzed using linear regression analyses of gene-doses. Here, we adjusted for age, sex, log-BMI, diabetes status, smoking status, haematocrit, platelet count, white blood cell count and the relatedness structure ([52,64,65], function “polygenic” of the “GenABEL” package of R was used to deal with the relatedness structure [63]).

Since test statistics are correlated due to LD between SNPs and correlations between metabolites, we decided to control the false-discovery rate (FDR) at 5% rather than family-wise error rates. Null-distribution for q-value calculation was determined by permutation analysis. For this purpose, 1000 random permutations of the links between SNPs and metabolites were analyzed.

Novelty assessment of SNP-metabolite associations and search for pleiotropic effects

We compared our results with published GWAS hits on the basis of the GWAS catalogue (http://www.genome.gov/gwastudies/, date of download March, 4^th, 2014). Required LD information was derived from HapMap3 (release 28) and 1000genomes project (release 20110521 version 3 f, restricted to SNPs with a MAF ≥ 1%). In addition, further evidence from published mQTL studies was manually included in this analysis to assess novelty of our results. A total of 13 studies were analyzed [8,9,12–21,23] (see also S4 Table). A locus was considered as novel if none of its SNPs were in linkage disequilibrium (r²>0.3) with any published mQTL hit reaching study-wide significance as defined by the authors of the corresponding publication. To increase relevance, we did not match the associated metabolic phenotypes between our study and the published ones, i.e. our approach of considering loci as novel is conservative.

In complete analogy to this analysis, we determined whether our top hits are associated with other traits for which results are published in the GWAS catalogue as well as those reported in two GWAS on plasma lipids [10,31]. These traits could point toward other causal or pleiotropic effects. If applicable, information on genetic disorders related to our loci were retrieved from OMIM (http://omin.org).

Gene-expression measurement and pre-processing

Peripheral blood mononuclear cells were isolated in the LIFE Leipzig Heart cohort using Cell Preparation Tubes (CPT, Becton Dickinson) as previously described [66]. Total RNA was extracted using TRIzol reagent (Invitrogen) and quantified with an UV-Vis spectrophotometer (NanoDrop, Thermo Fisher). 500 ng RNA per sample were ethanol precipitated with GlycoBlue (Invitrogen) as carrier and dissolved at a concentration of 50–300 ng/μl prior to probe synthesis. N = 2,501 samples were hybridised to Illumina HT-12 v4 Expression BeadChips (Illumina, San Diego, CA, USA) in batches of 48 and scanned on the Illumina HiScan instrument according to the manufacturer’s specifications [60]. Documentation of sample processing included batch information at any processing step allowing adjustment in subsequent data analysis.

Raw data of all 47,323 probes was extracted by Illumina GenomeStudio, 47,308 probes could be successfully imputed in all samples. Data was further processed within R/ Bioconductor R [67]. Individuals having an extreme number of expressed genes (defined as median ± 3 interquartile ranges (IQR) of the cohort’s values) were excluded. Transcripts that were not expressed according to Illumina’s internal cut-off as implemented in the “lumi” Bioconductor package (p ≤ 0.05 in at least 5% of all samples) were excluded from further analysis. Expression values were quantile-normalised and log2-transformed [68]. For further outlier detection, we calculated the Euclidian distance between all individuals and an artificial individual which was defined as the average of samples after removing 10% samples farthest away from the average of all samples. Individuals with a distance larger than median + 3 IQR were excluded. Furthermore, we defined for each individual a combined quantitative measure combining quality control features available for HT-12 v4 (i.e. ratio of levels of perfect-match vs. mismatch control probes, mean signal of perfect-match control probes, mean of negative control probes and labelling-control probes, ratios of high-concentrated, medium-concentrated and low-concentrated control-probes, mean of house-keeping genes, Euclidian distances of expression values, number of expressed genes, mean signal strength of biotin-control-probes). We calculated Mahalanobis-distance between all individuals and an artificial individual having average values for these quality control features. Individuals with a distance larger than median + 3 IQR were excluded. Transcript levels were adjusted for known batch effects using an empirical Bayes method as described [69] and residualised for age, sex, monocyte counts and lymphocyte counts. Additionally, we calculated principal components of the expression data and residualised for the first five principal components of expression data to account for unmeasured batch effects [70]. Pre-processing resulted in 28,295 expression probes corresponding to 19,519 genes. Chromosomal mapping of expression probes and assignment of gene names was done using information as reported by the manufacturer (HumanHT-12_V4_0_R2_15002873_B).

eQTL and gene-expression association analysis

After quality control, combined SNP and gene-expression data were available for a total of 2,112 individuals, from which 1,856 had been included in the GWAS. eQTL analysis of the pruned GWAS top-list was performed by linear regression analysis of gene-doses using the R add-on package Matrix eQTL [71]. EQTLs were considered as cis-regulated if the distance between SNP and the centre of the associated expression probe was not larger than 1 Mb, otherwise they were considered as trans-regulated. Cis -⁠ and trans -⁠ specific significance thresholds were derived by a Benjamini-Hochberg (B-H) procedure implemented in Matrix eQTL. For our data, cis associations with a p-value up to 0.0039 and trans-associations with a p-value up to 3.6x10^-14 were considered study-wide significant at FDR<5%. B-H q-values were empirically confirmed by 100 permutation tests (permutation of SNP and gene-expression profiles). Further details can be found elsewhere [72].

Association analysis of gene-expression and metabolites was performed in 1,957 individuals for which both information as well as covariates were available (1,856 of these individuals had been included in the GWAS). Again, we adjusted for age, sex, log-BMI, diabetes status, smoking status, fasting status, haematocrit, platelet count, white blood cell count. FDR was controlled at 5%.

As we observed multiple relationships between genetic loci, gene-expressions, and metabolites, we visualized all associations found at FDR 5% in a network. Previously published relations were identified by mapping genetic loci, genes, and metabolites from mQTL, eQTL, and gene-expression-metabolite association analysis to QIAGEN’s Ingenuity Pathway Analysis (IPA, QIAGEN Redwood City, www.qiagen.com/ingenuity), as of May, 2015). This database includes, among many other information, data on genome-wide protein-protein interactions, activation / co-localization and enzymatic reactions. Significantly associated SNPs were represented by the three most proximal genes and metabolite ratios by the individual nominator and denominator.

Identification of association triangles

For a more detailed characterization of the observed SNP-metabolite associations, we integrated genotype, gene expression and metabolite data to construct association triangles. A triangle is defined as a SNP that is significantly associated with both, a certain expression probe and a certain metabolite. Thereby, the expression probe must be also associated with the metabolite. For this purpose, we first determined the top associated SNP per locus, its corresponding best associated metabolite and eQTLs of that SNP (FDR = 5%, see above). Resulting triples of SNP, transcript level of eQTL and metabolite level were restricted to those showing a significant association between mRNA expression and metabolite level (FDR = 5%, see above). These gene-expressions were considered as possible explanatory quantities of the SNP-metabolite association.

We simulated the expected number of these association triangles under the null distribution by performing a comprehensive permutation analysis: We performed 100 permutations where we randomly assigned expression datasets and metabolic datasets to genetic datasets. We analysed these datasets for mQTLs, eQTLs, and gene-expression associations in accordance to our original analysis. For each of these 100 permutation-based datasets, we counted the number of pairwise associations and association triangles and compared it with the results of our original dataset. We calculated the empirical likelihood of triangles by comparing the observed number of six triangles with the number of triangles under the null assuming a Poisson distribution.

In order to exclude spurious correlation between gene-expression and metabolites as a cause of the observed association, we performed a Mendelian randomization analysis using our eQTL SNPs as instrumental variables [73]. In general, it is not easy to prove that the conditions of Mendelian randomization are fulfilled. In particular, a direct SNP effect on metabolites cannot be excluded, violating one of the assumptions [74]. Therefore, we adapted the Mendelian randomization analysis by using the residuals of metabolites regarding the remaining direct SNP effects (see also S1 Text for an extended discussion). Standard errors of Mendelian randomization effects were derived by Jackknife [75].

Furthermore, we tested whether gene-expressions explain at least parts of the observed mQTL associations. A subset of 1,856 individuals for which SNP, gene-expression, metabolite and covariate data were available, was eligible for this purpose. We analysed regression models of metabolites in dependence on SNPs, covariables and with or without gene-expression. We asked whether the absolute value of the beta-estimator of the SNP is reduced if gene-expression is added to the model. In this case, gene-expression explains a part of the observed SNP-metabolite association. The difference of these SNP beta-estimators is tested against zero by calculating Jackknife standard errors. This analysis also provides evidence for causal relations between genetic variants, gene-expression levels and metabolite concentrations. Since we observed that it is more stringent and conservative than Mendelian randomization analysis, our conclusions regarding causality are based on this type of analysis.

To gain additional insights into possible functional mechanisms of our loci, we performed the same analysis for all independently associated top-SNPs.

Supporting Information

Zdroje

1. Lehotay DC, Hall P, Lepage J, Eichhorst JC, Etter ML et al. (2011) LC-MS/MS progress in newborn screening. Clinical biochemistry 44 (1): 21–31. doi: 10.1016/j.clinbiochem.2010.08.007 20709048

2. Newgard CB (2012) Interplay between lipids and branched-chain amino acids in development of insulin resistance. Cell metabolism 15 (5): 606–614. doi: 10.1016/j.cmet.2012.01.024 22560213

3. Newgard CB, An J, Bain JR, Muehlbauer MJ, Stevens RD et al. (2009) A branched-chain amino acid-related metabolic signature that differentiates obese and lean humans and contributes to insulin resistance. Cell metabolism 9 (4): 311–326. doi: 10.1016/j.cmet.2009.02.002 19356713

4. Adams SH, Hoppel CL, Lok KH, Zhao L, Wong SW et al. (2009) Plasma acylcarnitine profiles suggest incomplete long-chain fatty acid beta-oxidation and altered tricarboxylic acid cycle activity in type 2 diabetic African-American women. The Journal of nutrition 139 (6): 1073–1081. doi: 10.3945/jn.108.103754 19369366

5. Mihalik SJ, Goodpaster BH, Kelley DE, Chace DH, Vockley J et al. (2010) Increased levels of plasma acylcarnitines in obesity and type 2 diabetes and identification of a marker of glucolipotoxicity. Obesity (Silver Spring, Md.) 18 (9): 1695–1700.

6. Brauer HA, Libby TE, Mitchell BL, Li L, Chen C et al. (2011) Cruciferous vegetable supplementation in a controlled diet study alters the serum peptidome in a GSTM1-genotype dependent manner. Nutrition journal 10 : 11. doi: 10.1186/1475-2891-10-11 21272319

7. Shah SH, Hauser ER, Bain JR, Muehlbauer MJ, Haynes C et al. (2009) High heritability of metabolomic profiles in families burdened with premature cardiovascular disease. Molecular systems biology 5 : 258. doi: 10.1038/msb.2009.11 19357637

8. Yu B, Zheng Y, Alexander D, Morrison AC, Coresh J et al. (2014) Genetic determinants influencing human serum metabolome among African Americans. PLoS Genet 10 (3): e1004212. doi: 10.1371/journal.pgen.1004212 24625756

9. Xie W, Wood AR, Lyssenko V, Weedon MN, Knowles JW et al. (2013) Genetic variants associated with glycine metabolism and their role in insulin sensitivity and type 2 diabetes. Diabetes 62 (6): 2141–2150. doi: 10.2337/db12-0876 23378610

10. Tukiainen T, Kettunen J, Kangas AJ, Lyytikäinen L, Soininen P et al. (2012) Detailed metabolic and genetic characterization reveals new associations for 30 known lipid loci. Human molecular genetics 21 (6): 1444–1455. doi: 10.1093/hmg/ddr581 22156771

11. Tanaka T, Shen J, Abecasis GR, Kisialiou A, Ordovas JM et al. (2009) Genome-wide association study of plasma polyunsaturated fatty acids in the InCHIANTI Study. PLoS genetics 5 (1): e1000338. doi: 10.1371/journal.pgen.1000338 19148276

12. Suhre K, Wallaschofski H, Raffler J, Friedrich N, Haring R et al. (2011) A genome-wide association study of metabolic traits in human urine. Nature genetics 43 (6): 565–569. doi: 10.1038/ng.837 21572414

13. Suhre K, Shin SY, Petersen AK, Mohney RP, Meredith D et al. (2011) Human metabolic individuality in biomedical and pharmaceutical research. Nature 477 (7362): 54–60. doi: 10.1038/nature10354 21886157

14. Shin SY, Fauman EB, Petersen AK, Krumsiek J, Santos R et al. (2014) An atlas of genetic influences on human blood metabolites. Nat Genet 46 (6): 543–550. doi: 10.1038/ng.2982 24816252

15. Rhee EP, Ho JE, Chen MH, Shen D, Cheng S et al. (2013) A genome-wide association study of the human metabolome in a community-based cohort. Cell Metab 18 (1): 130–143. doi: 10.1016/j.cmet.2013.06.013 23823483

16. Nicholson G, Rantalainen M, Li JV, Maher AD, Malmodin D et al. (2011) A genome-wide metabolic QTL analysis in Europeans implicates two loci shaped by recent positive selection. PLoS genetics 7 (9): e1002270. doi: 10.1371/journal.pgen.1002270 21931564

17. Luykx JJ, Bakker SC, Lentjes E, Neeleman M, Strengman E et al. (2014) Genome-wide association study of monoamine metabolite levels in human cerebrospinal fluid. Molecular psychiatry 19 (2): 228–234. doi: 10.1038/mp.2012.183 23319000

18. Kettunen J, Tukiainen T, Sarin A, Ortega-Alonso A, Tikkanen E et al. (2012) Genome-wide association study identifies multiple loci influencing human serum metabolite levels. Nature genetics 44 (3): 269–276. doi: 10.1038/ng.1073 22286219

19. Inouye M, Ripatti S, Kettunen J, Lyytikäinen L, Oksala N et al. (2012) Novel Loci for metabolic networks and multi-tissue expression studies reveal genes for atherosclerosis. PLoS genetics 8 (8): e1002907. doi: 10.1371/journal.pgen.1002907 22916037

20. Illig T, Gieger C, Zhai G, Römisch-Margl W, Wang-Sattler R et al. (2010) A genome-wide perspective of genetic variation in human metabolism. Nature genetics 42 (2): 137–141. doi: 10.1038/ng.507 20037589

21. Hong M, Karlsson R, Magnusson Patrik K E, Lewis MR, Isaacs W et al. (2013) A genome-wide assessment of variability in human serum metabolism. Human mutation 34 (3): 515–524. doi: 10.1002/humu.22267 23281178

22. Hicks AA, Pramstaller PP, Johansson A, Vitart V, Rudan I et al. (2009) Genetic determinants of circulating sphingolipid concentrations in European populations. PLoS genetics 5 (10): e1000672. doi: 10.1371/journal.pgen.1000672 19798445

23. Dharuri H, Henneman P, Demirkan A, van Klinken Jan Bert, Mook-Kanamori DO et al. (2013) Automated workflow-based exploitation of pathway databases provides new insights into genetic associations of metabolite profiles. BMC Genomics 14 : 865. doi: 10.1186/1471-2164-14-865 24320595

24. Demirkan A, van Duijn Cornelia M, Ugocsai P, Isaacs A, Pramstaller PP et al. (2012) Genome-wide association study identifies novel loci associated with circulating phospho -⁠ and sphingolipid concentrations. PLoS genetics 8 (2): e1002490. doi: 10.1371/journal.pgen.1002490 22359512

25. de Sain-van der Velden, Monique G M, Diekman EF, Jans JJ, van der Ham Maria, Prinsen Berthil H C M T et al. (2013) Differences between acylcarnitine profiles in plasma and bloodspots. Molecular genetics and metabolism 110 (1–2): 116–121.23639448

26. Gieger C, Radhakrishnan A, Cvejic A, Tang W, Porcu E et al. (2011) New gene functions in megakaryopoiesis and platelet formation. Nature 480 (7376): 201–208. doi: 10.1038/nature10659 22139419

27. van der Harst Pim, Zhang W, Mateo Leach I, Rendon A, Verweij N et al. (2012) Seventy-five genetic loci influencing the human red blood cell. Nature 492 (7429): 369–375. doi: 10.1038/nature11677 23222517

28. Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N et al. (2010) Genome-wide association study of hematological and biochemical traits in a Japanese population. Nature genetics 42 (3): 210–215. doi: 10.1038/ng.531 20139978

29. Danik JS, Paré G, Chasman DI, Zee Robert Y L, Kwiatkowski DJ et al. (2009) Novel loci, including those related to Crohn disease, psoriasis, and inflammation, identified in a genome-wide association study of fibrinogen in 17 686 women: the Women's Genome Health Study. Circulation. Cardiovascular genetics 2 (2): 134–141. doi: 10.1161/CIRCGENETICS.108.825273 20031577

30. Lange LA, Croteau-Chonka DC, Marvelle AF, Qin L, Gaulton KJ et al. (2010) Genome-wide association study of homocysteine levels in Filipinos provides evidence for CPS1 in women and a stronger MTHFR effect in young adults. Hum Mol Genet 19 (10): 2050–2058. doi: 10.1093/hmg/ddq062 20154341

31. Willer CJ, Schmidt EM, Sengupta S, Peloso GM, Gustafsson S et al. (2013) Discovery and refinement of loci associated with lipid levels. Nat Genet 45 (11): 1274–1283. doi: 10.1038/ng.2797 24097068

32. Kolz M, Johnson T, Sanna S, Teumer A, Vitart V et al. (2009) Meta-analysis of 28,141 individuals identifies common variants within five new loci that influence uric acid concentrations. PLoS genetics 5 (6): e1000504. doi: 10.1371/journal.pgen.1000504 19503597

33. Chambers JC, Zhang W, Lord GM, van der Harst P, Lawlor DA et al. (2010) Genetic loci influencing kidney function and chronic kidney disease. Nat Genet 42 (5): 373–375. doi: 10.1038/ng.566 20383145

34. Kottgen A, Pattaro C, Boger CA, Fuchsberger C, Olden M et al. (2010) New loci associated with kidney function and chronic kidney disease. Nat Genet 42 (5): 376–384. doi: 10.1038/ng.568 20383146

35. Lee Y, Yoon KA, Joo J, Lee D, Bae K et al. (2013) Prognostic implications of genetic variants in advanced non-small cell lung cancer. a genome-wide association study. Carcinogenesis 34 (2): 307–313. doi: 10.1093/carcin/bgs356 23144319

36. Zhang WC, Shyh-Chang N, Yang H, Rai A, Umashankar S et al. (2012) Glycine decarboxylase activity drives non-small cell lung cancer tumor-initiating cells and tumorigenesis. Cell 148 (1–2): 259–272. doi: 10.1016/j.cell.2011.11.050 22225612

37. Jain M, Nilsson R, Sharma S, Madhusudhan N, Kitami T et al. (2012) Metabolite profiling identifies a key role for glycine in rapid cancer cell proliferation. Science (New York, N.Y.) 336 (6084): 1040–1044.

38. Do CB, Tung JY, Dorfman E, Kiefer AK, Drabant EM et al. (2011) Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease. PLoS genetics 7 (6): e1002141. doi: 10.1371/journal.pgen.1002141 21738487

39. Moffatt MF, Gut IG, Demenais F, Strachan DP, Bouzigon E et al. (2010) A large-scale, consortium-based genomewide association study of asthma. The New England journal of medicine 363 (13): 1211–1221. doi: 10.1056/NEJMoa0906312 20860503

40. Khor S, Miyagawa T, Toyoda H, Yamasaki M, Kawamura Y et al. (2013) Genome-wide association study of HLA-DQB1*06 : 02 negative essential hypersomnia. PeerJ 1: e66. doi: 10.7717/peerj.66 23646285

41. Ludwig KU, Mangold E, Herms S, Nowak S, Reutter H et al. (2012) Genome-wide meta-analyses of nonsyndromic cleft lip with or without cleft palate identify six new risk loci. Nature genetics 44 (9): 968–971. doi: 10.1038/ng.2360 22863734

42. Rueedi R, Ledda M, Nicholls AW, Salek RM, Marques-Vidal P et al. (2014) Genome-wide association study of metabolic traits reveals novel gene-metabolite-disease links. PLoS genetics 10 (2): e1004132. doi: 10.1371/journal.pgen.1004132 24586186

43. Schramm K, Marzi C, Schurmann C, Carstensen M, Reinmaa E et al. (2014) Mapping the genetic architecture of gene regulation in whole blood. PLoS One 9 (4): e93844. doi: 10.1371/journal.pone.0093844 24740359

44. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A et al. (2010) A method and server for predicting damaging missense mutations. Nature methods 7 (4): 248–249. doi: 10.1038/nmeth0410-248 20354512

45. Ng PC, Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome research 11 (5): 863–874. 11337480

46. Sueyoshi T, Moore R, Sugatani J, Matsumura Y, Negishi M (2008) PPP1R16A, the membrane subunit of protein phosphatase 1beta, signals nuclear translocation of the nuclear receptor constitutive active/androstane receptor. Molecular pharmacology 73 (4): 1113–1121. doi: 10.1124/mol.107.042960 18202305

47. Westra H, Peters MJ, Esko T, Yaghootkar H, Schurmann C et al. (2013) Systematic identification of trans eQTLs as putative drivers of known disease associations. Nature genetics 45 (10): 1238–1243. doi: 10.1038/ng.2756 24013639

48. Rabquer BJ, Amin MA, Teegala N, Shaheen MK, Tsou P et al. (2010) Junctional adhesion molecule-C is a soluble mediator of angiogenesis. Journal of immunology (Baltimore, Md.: 1950) 185 (3): 1777–1785.

49. Albert FW, Kruglyak L (2015) The role of regulatory variation in complex traits and disease. Nature reviews. Genetics.

50. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E et al. (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science (New York, N.Y.) 337 (6099): 1190–1195.

51. Beutner F, Teupser D, Gielen S, Holdt LM, Scholz M et al. Rationale and design of the Leipzig (LIFE) Heart Study. phenotyping and cardiovascular characteristics of patients with coronary artery disease. PLoS One 6 (12): e29070. doi: 10.1371/journal.pone.0029070 22216169

52. Gross A, Tonjes A, Kovacs P, Veeramah KR, Ahnert P et al. (2011) Population-genetic comparison of the Sorbian isolate population in Germany with the German KORA population using genome-wide SNP arrays. BMC Genet 12 : 67. doi: 10.1186/1471-2156-12-67 21798003

53. Tonjes A, Koriath M, Schleinitz D, Dietrich K, Bottcher Y et al. (2009) Genetic variation in GPR133 is associated with height. genome wide association study in the self-contained population of Sorbs. Hum Mol Genet 18 (23): 4662–4668. doi: 10.1093/hmg/ddp423 19729412

54. Veeramah KR, Tonjes A, Kovacs P, Gross A, Wegmann D et al. (2011) Genetic variation in the Sorbs of eastern Germany in the context of broader European genetic diversity. Eur J Hum Genet 19 (9): 995–1001. doi: 10.1038/ejhg.2011.65 21559053

55. Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nature reviews. Genetics 6 (2): 95–108. 15716906

56. Ceglarek U, Muller P, Stach B, Buhrdel P, Thiery J et al. (2002) Validation of the phenylalanine/tyrosine ratio determined by tandem mass spectrometry. sensitive newborn screening for phenylketonuria. Clin Chem Lab Med 40 (7): 693–697. 12241016

57. Ceglarek U, Leichtle A, Brugel M, Kortz L, Brauer R et al. (2009) Challenges and developments in tandem mass spectrometry based clinical metabolomics. Mol Cell Endocrinol 301 (1–2): 266–271. doi: 10.1016/j.mce.2008.10.013 19007853

58. Brauer R, Leichtle AB, Fiedler GM, Thiery J, Ceglarek U (2011) Preanalytical standardization of amino acid and acylcarnitine metabolite profiling in human blood using tandem mass spectrometry. Metabolomics 7 (3): 344–352.

59. Fischer JE, Rosen HM, Ebeid AM, James JH, Keane JM et al. (1976) The effect of normalization of plasma amino acids on hepatic encephalopathy in man. Surgery 80 (1): 77–91. 818729

60. Holdt LM, Hoffmann S, Sass K, Langenberger D, Scholz M et al. (2013) Alu elements in ANRIL non-coding RNA at chromosome 9p21 modulate atherogenic cell functions through trans-regulation of gene networks. PLoS Genet 9 (7): e1003588. doi: 10.1371/journal.pgen.1003588 23861667

61. Wang J (2002) An estimator for pairwise relatedness using molecular markers. Genetics 160 (3): 1203–1215. 11901134

62. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38 (8): 904–909. 16862161

63. Tönjes A, Scholz M, Breitfeld J, Marzi C, Grallert H et al. (2014) Genome Wide Meta-analysis Highlights the Role of Genetic Variation in RARRES2 in the Regulation of Circulating Serum Chemerin. PLoS genetics 10 (12): e1004854. doi: 10.1371/journal.pgen.1004854 25521368

64. Amin N, van Duijn Cornelia M, Aulchenko YS (2007) A genomic background based method for association analysis in related individuals. PLoS One 2 (12): e1274. 18060068

65. Aulchenko YS, Koning de D, Haley C (2007) Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177 (1): 577–585. 17660554

66. Holdt LM, Beutner F, Scholz M, Gielen S, Gabel G et al. (2010) ANRIL expression is associated with atherosclerosis risk at chromosome 9p21. Arterioscler Thromb Vasc Biol 30 (3): 620–627. doi: 10.1161/ATVBAHA.109.196832 20056914

67. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M et al. (2004) Bioconductor. open software development for computational biology and bioinformatics. Genome Biol 5 (10): R80. 15461798

68. Schmid R, Baum P, Ittrich C, Fundel-Clemens K, Huber W et al. (2010) Comparison of normalization methods for Illumina BeadChip HumanHT-12 v3. BMC Genomics 11 : 349. doi: 10.1186/1471-2164-11-349 20525181

69. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8 (1): 118–127. 16632515

70. Fehrmann RS, Jansen RC, Veldink JH, Westra HJ, Arends D et al. (2011) Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS Genet 7 (8): e1002197. doi: 10.1371/journal.pgen.1002197 21829388

71. Shabalin AA (2012) Matrix eQTL. ultra fast eQTL analysis via large matrix operations. Bioinformatics 28 (10): 1353–1358. doi: 10.1093/bioinformatics/bts163 22492648

72. Kirsten H, Al-Hasani H, Holdt LM, Gross A, Beutner F et al. (2014) Dissecting the Genetics of the Human Transcriptome identifies novel trait-related trans-eQTLs and corroborates the regulatory relevance of non-protein coding loci (submitted).

73. Nelson CR, Startz R (1988) The distribution of the instrumental variables estimator and its t-ratio when the instrument is a poor one. NBER TECHNICAL WORKING PAPER SERIES (#69).

74. Lawlor DA, Harbord RM, Sterne Jonathan A C, Timpson N, Davey Smith G (2008) Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Statistics in medicine 27 (8): 1133–1163. 17886233

75. Efron B (1981) Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika 68 (3): 589–599.