Genome-wide association study identifies 16 genomic regions associated with circulating cytokines at birth

Authors: Yunpeng Wang ^aff001; Ron Nudel ^aff001; Michael E. Benros ^aff001; Kristin Skogstrand ^aff001; Simon Fishilevich ^aff008; ; Doron Lancet ^aff008; Jiangming Sun ^aff001; David M. Hougaard ^aff001; Ole A. Andreassen ^aff003; Preben Bo Mortensen ^aff001; Alfonso Buil ^aff001; Thomas F. Hansen ^aff001; Wesley K. Thompson ^aff001; Thomas Werge ^aff001
Authors place of work: The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Denmark ^aff001; Institute of Biological Psychiatry, Mental Health Center St. Hans, Mental Health Services Copenhagen, Denmark ^aff002; Norwegian Centre for Mental Disorders Research (NORMENT), Institute of Clinical Medicine, University of Oslo, and Oslo University Hospital, Norway ^aff003; Lifespan Changes in Brain and Cognition (LCBC), Department of Psychology, University of Oslo, Norway ^aff004; Mental Health Center Copenhagen, Copenhagen University Hospital, Denmark ^aff005; Department of Immunology and Microbiology, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark ^aff006; Danish Centre for Neonatal Screening, Department of Congenital Diseases, Statens Serum Institut, Denmark ^aff007; Department of Molecular Genetics, Weizmann Institute of Science, Israel ^aff008; Department of Economics and Business Economics-National Centre for Register-based Research, University of Aarhus, Denmark ^aff009; Danish Headache Center, Department of Neurology, University Hospital Copenhagen, Denmark ^aff010; Division of Biostatistics, Department of Family Medicine and Public Health, University of California, San Diego, California, United States of America ^aff011; Department of Clinical Medicine, University of Copenhagen, Denmark ^aff012
Published in the journal: Genome-wide association study identifies 16 genomic regions associated with circulating cytokines at birth. PLoS Genet 16(11): e1009163. doi:10.1371/journal.pgen.1009163
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1009163

Summary

Circulating inflammatory markers are essential to human health and disease, and they are often dysregulated or malfunctioning in cancers as well as in cardiovascular, metabolic, immunologic and neuropsychiatric disorders. However, the genetic contribution to the physiological variation of levels of circulating inflammatory markers is largely unknown. Here we report the results of a genome-wide genetic study of blood concentration of ten cytokines, including the hitherto unexplored calcium-binding protein (S100B). The study leverages a unique sample of neonatal blood spots from 9,459 Danish subjects from the iPSYCH initiative. We estimate the SNP-heritability of marker levels as ranging from essentially zero for Erythropoietin (EPO) up to 73% for S100B. We identify and replicate 16 associated genomic regions (p < 5 x 10⁻⁹), of which four are novel. We show that the associated variants map to enhancer elements, suggesting a possible transcriptional effect of genomic variants on the cytokine levels. The identification of the genetic architecture underlying the basic levels of cytokines is likely to prompt studies investigating the relationship between cytokines and complex disease. Our results also suggest that the genetic architecture of cytokines is stable from neonatal to adult life.

Keywords:

Cytokines – Gene regulation – Genetics of disease – Genome annotation – Genomics – Inflammatory diseases – Single nucleotide polymorphisms

Introduction

Circulating inflammatory markers are essential to human health and disease [1]. An important group of small circulating proteins are cytokines. These have important roles in cell signaling in general and in modulating immune function in particular, including inducing and reducing inflammation [2] Circulating inflammatory cytokines have been implicated in many classes of diseases, including cancers [3], cardiovascular diseases [4], metabolic diseases [5], autoimmune diseases [6] and neuropsychiatric disorders [7]. Their utility goes beyond their explanatory power in disease mechanism; since measuring their blood levels is a simple procedure, they can be useful in their diagnostic and predictive power. For example, they can be used as indicators for obesity and early cancer risk factors [8]. In addition to their involvement in disease, cytokines are also involved in both physiological function e.g. pain [9] and mental, cognitive, or brain function [10]. The latter point might be extremely important given emerging evidence for the links between immune function and psychiatric disorders [11–13].

Despite their relevance to disease mechanism and diagnostic power, only few studies have examined the genetic architecture of circulating inflammatory markers[14–17]. Furthermore, previous studies have mainly used adult samples; thus, it is unclear whether the genetic control of inflammatory markers varies across age groups. Here, we estimate the genetic contribution to variation in the circulating cytokine marker levels at birth for: interleukin 8 and 18 (IL8 and IL18), monocytes chemoattractant protein (MCP1 aka CCL2), thymus and reactivation regulated chemokine (TARC, also known as CCL17), erythropoietin (EPO), immunoglobulin A (IgA), C-reactive protein (CRP), brain-derived neurotrophic factor (BDNF), vascular endothelial growth factors (VEGFA) and S100 calcium-binding protein (S100B). Of note, the genetics of S100B has not been previously studied.

Results

We use data from 12,000 neonatal blood spots as part of the Danish iPSYCH Initiative[18], in which the concentrations of ten cytokines were measured using a two-step design with a discovery sample (N = 10,000) and a replication sample (N = 2,000). Five of the ten markers, i.e. BDNF, IL8, IL18, MCP1 and S100B, were measured in both samples. Both discovery and replication samples included subjects tested at birth who later in life had at least one inpatient or outpatient hospital discharge code involving one or more of six psychiatric disorders: schizophrenia, bipolar disorder, depression, autism, attention-deficit/hyperactivity disorder and anorexia (S1 Table)[18], as well as a random population sample.

Genome-wide genotyping of DNA extracted from neonatal blood spots was accomplished using the Infinium PsychChip v1.0 array in 23 waves (for detailed protocol see Pedersen et al.[18]) and used to impute ~9 million 1000 Genomes Project Phase 3 SNPs. We performed two rounds of strict quality control to remove possible technical artifacts within each wave and across waves, respectively (Materials and Methods). We inferred the ancestry of each subject using both national birth register data and genomic principal component (PC) analysis. Non-Danish subjects were subsequently removed before the genetic association analyses. In total, 8,318 and 1,141 subjects were used in the discovery and replication analyses, respectively (Materials and Methods). Marker levels were log-transformed and age-residualized using a generalized additive model with 5 degrees of freedom (hereafter normalized, Materials and Methods). As expected, we observed both positive and negative correlations of the measured marker levels; correlation coefficients range from -0.06 to 0.43, but positive correlation is observed in the majority of the cases (S33 Fig). This suggests a complex regulation mechanism for immune responses.

We first estimated the proportions of the variance of marker levels accounted for by genetic variants (h²_SNP) using restricted maximum likelihood[19]. S100B shows the highest h²_SNP (0.73), while EPO has h²_SNP~0; (Fig 1A). SNP-heritabilities for the remaining eight markers range from 0.08 (BDNF) to 0.21 (IL18 and VEGFA). For each marker, h²_SNP was partitioned to autosomes, revealing that SNP-heritabilities of S100B, CRP, IL18 and IgA predominantly stem from the chromosomes where their coding genes are located (Fig 1B), suggesting strong cis-regulatory mechanisms. In contrast, analyses suggest disperse and polygenic trans-regulation for IL8, MCP1 and TARC (Fig 1B).

**Fig. 1. SNP heritability of circulating protein levels.**

We correlated blood marker levels with polygenic risk scores (PGRSs) constructed from effect size estimates reported in a previous, independent study[17]. As shown in Fig 1C, moderately high correlations were observed for VEGFA, IL18, MCP1 (r = 0.31, p<10⁻¹⁶; r = 0.18, p <10⁻¹⁶; r = 0.19, p<10⁻¹⁶; respectively, using SNPs with association p<10⁻⁶ from the independent study), whereas IL8 blood levels and PGRSs are only marginally correlated.

We performed a genome-wide association study for each cytokine using a multiple linear regression model, including the first 6 principal components (PCs), diagnosis of any of the six disorders, genotyping wave indicators and sex as covariates (Materials and Methods). The same model was separately applied to both discovery and replication samples. We did not observe inflation in the resulting association statistics (lambda: min = 0.99, max = 1.03; S1–S10 Figs). Except for the cases of BDNF and IL8, we observed a high number genetic variants significantly associated (P<5x10^-9) with the cytokine markers, ranging from 131 for EPO to 3,941 for S100B. Extreme p values (P < 10⁻¹⁰⁰) are especially common for IL18, S100B and VEGFA (Fig 2A) in line with analyses showing strong cis-regulatory mechanisms for these markers (Fig 1B). As shown in Fig 2B, common variants (minor allele frequency: MAF>20%) make up 56% of all significant variants.

**Fig. 2. Distribution of association statistics for inflammation marker level a.**

We clumped the association signals into 20 independent loci (16 unique loci) indexed by at least one significant SNP (P<5 x 10⁻⁹) (Materials and Methods, S11–S30 Figs), associated with one or more markers (Table 1). Out of the 20 associations, four are novel and confirmed in the replication study (P<0.0036, Table 1). The first novel association with IL18 is in 19p13.2 indexed by rs56195122 (P = 2.4x10^-13, MAF = 0.03, replication P = 6.59x10^-4, S17 Fig). The SNP rs56195122 is in the first intron of the synaptonemal complex central element protein 2 gene (SYCE2), associated with several blood metabolites levels[21] and blood cell related traits[22, 23]. The second locus is associated with MCP1, indexed by rs4493469 (P = 1.62x10-16, MAF = 0.1, replication P = 2.0x10-3, 27kb upstream of the C-C chemokine receptor type 3 gene, CCR3, S20 Fig).

**Tab. 1. Genome wide significant associations with blood inflammatory marker levels.**

Genome wide significant SNPs were clumped into 20 independent regions. Information about each region includes, leading SNP (SNP), Chromosome (Chr), cytoband (Region), genomic position (Pos, hg19), effective allele (A1), alternative allele (A2), effect size (Beta), standard error (Se), association p values (P), proportion of maker level variance accounted for by the leading SNP (VE), imputation quality score (INFO), association p values in the replication sample or previously reported studies (P repl) and the gene closest to the leading SNPs (Gene).

We discovered two novel regions associated with S100B levels in blood (Table 1). The first region at 21q22.3 is indexed by rs62224256 (P<1x10^-300, MAF = 0.49, replication P = 9.58x10^-38, S23 Fig) located 21kb downstream of the pericentrin gene (PCNT), which is a calmodulin binding protein. Remarkably, the leading SNP accounts for 18% of the variation of S100B level in blood in the discovery sample (Table 1). The A allele of the top SNP rs62224256 is associated with reduced levels corresponding to 0.32 standard deviations (SD = 0.02, P<2x10^-16) and explains 14% of S100B variation in the replication sample (Fig 3A). We also discovered that the human leukocyte antigen (HLA) region (build hg19, chr6: 28,477,797–33,448,354) is associated with the variation of circulating S100B, led by rs28397289 (P = 7.67x10^-19, MAF = 0.24, S24 Fig). The association of the HLA region with S100B in the replication sample is indexed by another SNP rs4713462 (replication P = 5.83x10^-5, MAF = 0.30).

Additionally, 14 of the 20 loci replicated previously-reported associations (S1 Text and S10 Table).

**Fig. 3. Prediction of inflammation marker levels by genetic variants.**

We constructed PGRSs for: BDNF, IL18, IL8, MCP1 and S100B, measured in both samples, for the replication analysis using the effect estimates from discovery association. Fig 3B shows Pearson’s correlations between the PGRSs and the corresponding normalized marker levels stratified by different “discovery association strength”. The PGRSs based on SNPs with P<10⁻⁶ (S1) were correlated most strongly across all markers except IL8. In contrast, PGRS constructed with all SNPs (S8), i.e. P≤<1.0, show no significant correlation except for with S100B. The PGRSs constructed with SNPs with P<10⁻⁶ accounts for 21% of S100B variation in the replication sample (Fig 3C), and the correlation between PGRS and S100B levels is ~0.5. For comparison, an analysis based on a previously-studied discovery sample[17] is shown in S31 and S32 Figs. The observed low correlations between PGRS and IL8 levels (Figs 1C, 3B, S31 and S32 Figs) can be partially explained by the low SNP heritability for IL8 estimated in our sample (Fig 1A). On the other hand, the most significant correlations for the other cytokines was achieved by the PGRS with the lowest p-value threshold indicate that the genetic architecture of cytokines may be less polygenic than other human complex traits.

The associated loci contain a large number of genome-wide significant SNPs (P<5x10^-9, Fig 2A), making it challenging to infer the causal variants for follow-up experimental studies. We performed Bayesian statistical fine-mapping on each associated region[24] (Materials and Methods). For each associated region, we inferred the most probable causal configuration (causal set) assuming at most 3 causal variants per region (Materials and Methods). As shown in Table 2, eleven causal sets include their corresponding leading SNPs, among which 3 are one-variant sets. Nonetheless, 9 causal sets do not contain their corresponding leading SNPs, indicating that the top association signals may be driven by the allelic combination of SNPs in the causal sets (Table 2). Re-analysis assuming at most six causal variants per region did not change the results (S9 Table).

**Tab. 2. Fine mapping of associated regions.**

The FINEMAP program was applied to each associated region (500kb left and right of the leading SNP) in Table 1, assuming each region contains at most three causal variants. Abbreviations used: log₁₀(BFc): common logarithm of Bayesian factor for the inferred most probable causal configuration; log₁₀(BF): common logarithm of Bayesian factor for the SNP being in the causal set; PIP: posterior inclusion probability in the causal set; R²: LD r-square of the SNP with the leading SNP in the corresponding region; P: association p value in the discovery sample; P repl: association p value in the replication sample or previous studies; Gene: closest gene to the corresponding SNP; Enh gene: inferred genes regulated by the corresponding enhancer.

Most of the identified genetic variants are located outside of protein-coding regions. We integrated associated loci with public epigenomic datasets[25–28] to infer plausible regulatory mechanisms. Eighteen of the 50 identified leading SNPs implicated by both association and fine-mapping analyses are located in enhancers from GeneHancer, the GeneCards Suite[29] database of human enhancers and their associated genes (Materials and Methods)(Table 2 and S2–S8 Tables). We also tested whether cytokine associated SNPs were enriched in DNAse hypertensive sites, histone modification sites and chromatin states. However, after correcting multiple testing no significant enrichment was observed (S34–S36 Figs). In Fig 4, we demonstrate the annotation by the 21q22.3 region, indexed by rs62224256, associated with S100B level. The SNPs rs11910707 (P = 1.26x10^-205, replication P = 1.66x10^-27, log₁₀BF = 13.25, 12kb upstream of PRMT2) and rs2839314 (P = 188x10^-240, replication P = 4.30x10^-19, log₁₀BF = 4.1, 22kb upstream of DIP2A) are the most probable causal variants. The rs11910707 SNP overlaps with the elite enhancer (Materials and Methods) GH21G046620, and rs2839314 –with GH21G046541. Both enhancers modulate the transcription of the S100B gene (the former through a double-elite association). Moreover, among the genes regulated by at least one of these enhancers are PRMT2, DIP2A, and SPATC1L (Table 2). Thus, the most highly associated signal with rs62224256 is highly likely to be a proxy of the two causal SNPs. As such, the closest gene, PCNT, may or may not play roles in the regulation of circulating S100B.

**Fig. 4. Annotation of the region indexed by rs62224256 associated with S100B.**

Discussion

In this study, we investigate the genetic architecture of ten cytokines in whole blood at birth, in a sample of 12,000 individuals, the largest study so far. Our results highlight an important role for regulatory elements in determining levels of circulatory inflammatory markers. Importantly, we robustly replicate our findings in an in-house replication sample and by using data from other studies[16, 17]. The latter studies, in contrast to the current study, were based on adult samples, and, therefore, our results suggest that the genetic architecture of cytokines is stable from neonatal to adult life.

Inflammation and conditions associated with it, such as infections and autoimmune diseases, have been implicated in a number of disorders and medical conditions[1], including mental disorders[7, 11, 13]. In the context of the latter type of disorders, studies such as ours could be of great utility; while it has been known for a long time that mental disorders have strong genetic etiologies [30], when it comes to reliable accounts of disease mechanism, our current understanding is very limited compared to not only monogenic disorders, but also other complex disorders such as autoimmune disorders [31, 32]. This is not necessarily due to lack of significant genetic associations, e.g. for schizophrenia [33], but rather it could also stem from the difficulty in defining the psychiatric traits. In this respect, leveraging the results of studies such as ours could be useful for both diagnosis and as a future avenue for research; given the links between inflammation, immunity and mental illness, and the properties of some of the inflammatory makers studied here, it could be envisaged that the latter could be used in a way similar to how endophenotypes could be used in psychiatry[34, 35]. Moreover, the intricate genetic architecture identified in this study, which highlights gene regulation, could be informative to molecular studies of psychiatric diseases and other types of diseases. For example, it is likely to prompt studies using e.g. Mendelian randomization[36] to investigate the relationship between inflammatory markers and complex disease.

The main strengths of our study are the large number of markers included, the large sample size, and the replication sample (S37 Fig). The postnatally sampling on days 5–7 day renders our findings relatively independent of the child’s behavior and natural environment, which could be considered a major strength. However, it should be noted that the marker levels may, in some cases, be influenced by perinatal complications, diseases and medication administered to the child, as well as by the smoking habits, alcohol consumption, diet, weight and other general life conditions of the mother. Certain peptides, e.g. antibodies, cross the placenta, and neonatal levels in the child therefore reflect those of the mothers at birth, thus reducing the power of the study and accounting for the zero heritability of IL8 and BDNF. A possible source of noise in the levels of inflammatory markers is that measurements come from dried, whole blood samples that may not precisely correspond to concentrations measures in plasma or serum in practice. However, our replication of findings from adult samples suggests that these putative biases do not present a serious limitation to the study.

In conclusion, our study sheds some light on the complex genetic architecture of inflammatory markers and highlights the important role of regulatory elements therein. We also show that the mechanisms involved are relatively stable throughout life, by comparing our results to those of studies which used adult samples. We hope that these results will prompt future studies looking into the links between inflammation and complex diseases and, in particular, that they will contribute to investigations into the mechanisms of mental illness, which have proven difficult to explain from a molecular perspective.

Materials and methods

Sample

The sample was based on complete and consecutive birth cohorts of all singletons born in Denmark between May 1, 1981 and December 31, 2005. Only individuals who were residents in Denmark on their first birthday and who have a known mother (N = 1,536,309) were included. From this group, 78,000 subjects were genotyped in 23 waves by the Broad Institute using the PsychChip version 1. For the discovery sample, 10,000 subjects were randomly selected from the 23 waves of the iPSYCH initiative[18]. For the replication sample 2,000 subjects were chosen from the second wave, excluding the discovery sample (for detailed description of samples see S1 Table).

Cytokine level measurements

The 2000 samples for replication analysis were measured using Luminex technology as described by Skogstrand et al.[37, 38]. The second 10 000 samples used for discovery study were measured using Meso-Scale technology as described in Skogstrand et al.[39]. Briefly, dried blood spot sample were punched as 3.2mm disks into PCR-plates (Sarstedt, 72.1981.202). 130 μl extraction buffer (PBS containing 1% BSA and 0,5%Tween-20) were added to each well, and the samples were extracted in 1 hour at room temperature on a microwell shaker set at (900rpm). The extracts were manually moved to sterile Matrix 2D tubes (Thermo Scientific, 3232) and frozen at -80°C. One (Luminex) or two (Meso-Scale) years later, samples were thawed and analyzed using either Luminex technology in-house assays or Meso-Scale plates printed customized for the project. Analyte concentrations were calculated from the calibrator curves on each plate using 5PL (Luminex) or 4PL (Meso-Scale) logistic regression. Analytes falling below the lowest concentration within the working range were assigned to that value.

The measured levels were first inspected for potential outliers by scatter plots. Then, each marker level was logarithm transformed and age-residualized using a generalized additive model with 5 degrees of freedom, using the R function ‘gam’. The resultant data was further checked for normality and outliers.

Quality control and imputation

Quality control, and imputation were performed for each wave separately. The quality control parameters for retaining SNPs and subjects were: SNP missingness≤0.05 (before sample removal); subject missingness ≤ 0.02; autosomal heterozygosity deviation (| Fhet | ≤ 0.2); SNP missingness≤0.02 (after sample removal); and, SNP Hardy-Weinberg equilibrium (P > 10⁻⁶). Genotype imputation was performed using the pre-phasing/imputation stepwise approach implemented in IMPUTE2[40]/ SHAPEIT2[41](chunk size of 3 Mb and with default parameters). The imputation reference set consisted of 2,186 phased haplotypes from the full 1000 Genomes Project Phase 3. Only autosome chromosomes were analyzed.

After imputation, we identified SNPs with high imputation quality (INFO ≥ 0.1) and minor allele frequency (MAF > 0.01). Imputed dataset across 22 waves were merged and further quality control measures were applied (min INFO ≥ 0.1 and MAF ≥ 0.01). The best-guess genotypes were called using parameters: INFO ≥ 0.9 and MAF > = 0.05. The set of SNPs after linkage disequilibrium pruning (r2 ≥ 0.02) was used for relatedness testing and population structure analysis. PLINK[42] was used for relatedness testing. One random member of a pair of subjects with pi-hat ≥ 0.2 were removed. Principal component analysis was performed using EIGENSOFT[43] with the same collection of autosomal SNPs. After quality control, 8,318 subjects remained for discovery and 1,141 subjects for replication sample. In total, about 9 million SNPs were used in the association study.

SNP heritability, h²_SNP

The merged genotypes for discovery sample were quality controlled using the same parameter as above. Before estimating the heritability, SNPs were thinned by the PLINK³⁸ using the command:—indep-pairwise 100 50 0.2. The first 6 PCs (see next section), genotyping wave indicators and sex were used as covariates in the restricted maximum likelihood-based program BOLT-REML[19]. To estimate per-chromosome SNP heritability, SNPs located in the focal chromosome was removed and the estimated h²_SNP was subtracted from the whole genome estimates.

Genome-wide association

Genome-wide association study of SNPs with inflammation marker levels were performed separately for the discovery and replication sample using a multiple linear regression model implemented in PLINK[42]. Principal components were computed separately for discovery and replication, and the first 6 principal components were used as covariates, along covariates for sex and wave indicator variables. We employed the first 6 PCs following regression analyses testing each PC and each cytokine until we reached a PC which was not associated (P>0.05) with any of the 10 cytokines. Manhattan plots in S1–S8 Figs presented the association results. The genomic inflation factors were estimated and shown in the quantile-quantile plots in S1–S10 Figs. The regional association results were constructed using LocusZoom[44](S11–S30 Figs). The phenotypic variance explained by a SNP was estimated by the (β*2*p*(1−p))2, where β is the estimated effect and p the allele frequency in the discovery sample.

Associated regions and genes

Association results were ‘clumped’ using PLINK based on the linkage disequilibrium structure of the 1000 Genomes projects phase 3 EUR dataset, with parameters–clump-p1 5e-9 –clump-2 1e-6 –clump-r2 0.1. Five hundred kilo-base (kb) were used as inter-region distance threshold. Genes whose genomic coordinates located within the boundaries of each region were assigned to the corresponding region. SNPs with the smallest association p values were taken as the leading SNP for the corresponding region. The associated SNPs were annotated to the closest genes by genomic position the Ensembl tool VEP[45] (S2–S8 Tables).

Fine mapping

Association regions were fine-mapped using the FINEMAP[24] program. Regions were defined as genomic segments 500kb on both sides of the most significant SNP in an associated region (P < 5x10^-9). Linkage disequilibrium data from the 1000 Genomes Project phase 3 European sample were used in fine mapping. We performed two analyses: the first set the maximum number of causal variants to 3 and the other to 6. S2–S8 Tables listed all SNPs with posterior inclusion probability (PIP) > 0.1 for 3-causal variants analysis. S9 Table listed all inferred causal SNPs in each region for 6-causal variants analysis. The log₁₀ Bayesian factors for the causal set (log10BFc) and for each SNP are shown in the tables along with association statistics.

Enhancer annotation

The associated SNPs were mapped onto genomic enhancer regions from the GeneHancer database (v4.5) [25] using a specially-prepared annotated dataset. The GeneHancer database contains enhancers that were integrated from five enhancer sources (Ensembl[46], ENCODE[47], VISTA[48], dbSUPER[49] and FANTOM[50]) and enhancer-gene connections that are based on five methods (eQTLs[51], eRNAs[50], TF-gene expression correlations, capture-HiC[52], and genomic distance from TSS). Double-elite associations are considered to be more confident annotations and are defined as enhancer-gene connections for which both the enhancer itself and the connection to the gene are supported by at least two sources or methods, respectively.

Polygenic risk scoring

We computed the polygenic risk scores (PGRS) for both discovery and replication samples. To compute the effect size: for discovery sample, we used the association results from the previous study[17]; and, for the replication sample, we used both the association results from discovery sample and the same previous study. The association summary statistics were first carefully filtered by removing SNPs with: MAF < 0.05 or INFO < 0.8 or having a multi-character allele. We, then, clumped the resultant data based on the 1000 Genomes Project 3 EUR linkage disequilibrium structure using the program PLINK[42] with parameters:—clump-p1 1.0,—clump-p2 1.0,—clump-r2 0.1 and—clump-kb 500. The same program was used for scoring each subject in our sample. The correlations between normalized marker levels and PGRS were computed using the R program with the cor.test for the Pearson’ correlation. The proportions of the variance explained for each marker by each PGRS was computed as the square of the Pearson’s correlation coefficients.

Supporting information

S1 Text [pdf]
Additional analyses performed.

S1 Table [pdf]
Description of samples.

S2 Table [xlsx]
Full annotation results for CRP level using FINEMAP, Ensemble VEP, HaploReg, Enhancer and Sherlock.

S3 Table [xlsx]
Full annotation results for EPO level using FINEMAP, Ensemble VEP, HaploReg, Enhancer and Sherlock.

S4 Table [xlsx]
Full annotation results for IL18 level using FINEMAP, Ensemble VEP, HaploReg, Enhancer and Sherlock.

S5 Table [xlsx]
Full annotation results for MCP1 level using FINEMAP, Ensemble VEP, HaploReg, Enhancer and Sherlock.

S6 Table [xlsx]
Full annotation results for S100B level using FINEMAP, Ensemble VEP, HaploReg, Enhancer and Sherlock.

S7 Table [xlsx]
Full annotation results for TARC level using FINEMAP, Ensemble VEP, HaploReg, Enhancer and Sherlock.

S8 Table [xlsx]
Full annotation results for VEGFA level using FINEMAP, Ensemble VEP, HaploReg, Enhancer and Sherlock.

S9 Table [xlsx]
FINEMAP results for regions associated with S100B assuming six causal variants.

S10 Table [xlsx]
Replication of SNPs identified by present study with those reported by Ahola-Olli et al.

S1 Fig [pdf]
The Manhattan & qq plots for BDNF level.

S2 Fig [pdf]
The Manhattan & qq plots for IL8 level.

S3 Fig [pdf]
The Manhattan & qq plots for CRP level.

S4 Fig [pdf]
The Manhattan & qq plots for EPO level.

S5 Fig [pdf]
The Manhattan & qq plots for IgA level.

S6 Fig [pdf]
The Manhattan & qq plots for IL18 level.

S7 Fig [pdf]
The Manhattan & qq plots for MCP1 level.

S8 Fig [pdf]
The Manhattan & qq plots for S100B level.

S9 Fig [pdf]
The Manhattan & qq plots for TARC level.

S10 Fig [pdf]
The Manhattan & qq plots for VEGFA level.

S11 Fig [pdf]
Region plot for CRP-rs3091244 association.

S12 Fig [pdf]
Region plot CRP rs112635299 association.

S13 Fig [pdf]
Region plot EPO rs1130864 association.

S14 Fig [pdf]
Region plot IgA rs3094087 association.

S15 Fig [pdf]
Region plot IL18 rs10891329 association.

S16 Fig [pdf]
Region plot IL18 rs10891268 association.

S17 Fig [pdf]
Region plot IL18 rs56195122 association.

S18 Fig [pdf]
Region plot IL18 rs9402686 association.

S19 Fig [pdf]
Region plot MCP1 rs12075 association.

S20 Fig [pdf]
Region plot MCP1 rs4493469 association.

S21 Fig [pdf]
Region plot MCP1 rs2228467 association.

S22 Fig [pdf]
Region plot MCP1 rs60200069 association.

S23 Fig [pdf]
Region plot S100B rs62224256 association.

S24 Fig [pdf]
Region plot S100B rs28397289 association.

S25 Fig [pdf]
Region plot TARC rs115952894 association.

S26 Fig [pdf]
Region plot TARC rs2228467 association.

S27 Fig [pdf]
Region plot TARC rs10886430 association.

S28 Fig [pdf]
Region plot TARC rs223896 association.

S29 Fig [pdf]
Region plot VEGFA rs7767396 association.

S30 Fig [pdf]
Region plot VEGFA rs11789392 association.

S31 Fig [pdf]
Polygenic score for discovery sample based on Ahola-Olli et al.

S32 Fig [pdf]
Polygenic score for replication sample based on Ahola-Olli et al.

S33 Fig [pdf]
Pearson's correlation among inflammation markers.

S34 Fig [pdf]
Enrichment of genetic variants with DNAse hypersensitive sites.

S35 Fig [pdf]
Enrichment of genetic variants with histone modification.

S36 Fig [pdf]
Enrichment of genetic variants with chromatin state.

S37 Fig [pdf]
Power analysis of discovery and replication samples.

Zdroje

1. Lackie J. Dictionary of Biomedicine—Oxford Reference. 1 ed. Oxford, UK: Oxford University Press; 2010.

2. Zhang J-M, An J. Cytokines, Inflammation, and Pain. Int Anesthesiol Clin. 2007;45(2):27–37. doi: 10.1097/AIA.0b013e318034194e 17426506

3. Dranoff G. Cytokines in cancer pathogenesis and cancer therapy. Nat Rev Cancer. 2004;4(1):11–22. doi: 10.1038/nrc1252 14708024

4. Braunwald E. Biomarkers in heart failure. N Engl J Med. 2008;358(20):2148–59. doi: 10.1056/NEJMra0800239 18480207

5. Hotamisligil GS. Inflammation, metaflammation and immunometabolic disorders. Nature. 2017;542:177–85. doi: 10.1038/nature21363 28179656

6. Neurath MF. Cytokines in inflammatory bowel disease. Nat Rev Immunol. 2014;14(5):329–42. doi: 10.1038/nri3661 24751956

7. Najjar S, Pearlman DM, Alper K, Najjar A, Devinsky O. Neuroinflammation and psychiatric illness. J Neuroinflammation. 2013;10:43. doi: 10.1186/1742-2094-10-43 23547920

8. Kim S, Keku TO, Martin C, Galanko J, Woosley JT, Schroeder JC, et al. Circulating Levels of Inflammatory Cytokines and Risk of Colorectal Adenomas. Cancer Res. 2008;68(1):323–8. doi: 10.1158/0008-5472.CAN-07-2924 18172326

9. da Cruz Fernandes IM, Pinto RZ, Ferreira P, Lira FS. Low back pain, obesity, and inflammatory markers: exercise as potential treatment. J Exerc Rehabil. 2018;14(2):168–74. doi: 10.12965/jer.1836070.035 29740548

10. Pollmächer T, Haack M, Schuld A, Reichenberg A, Yirmiya R. Low levels of circulating inflammatory cytokines—Do they affect human brain functions? Brain Behav Immun. 2002;16(5):525–32. doi: 10.1016/s0889-1591(02)00004-1 12401466

11. Nudel R, Benros ME, Krebs MD, Allesøe RL, Lemvigh CK, Bybjerg-Grauholm J, et al. Immunity and mental illness: findings from a Danish population-based immunogenetic study of seven psychiatric and neurodevelopmental disorders. Eur J Hum Genet. 2019;27(9):1445–55. doi: 10.1038/s41431-019-0402-9 30976114

12. Nudel R, Wang Y, Appadurai V, Schork AJ, Buil A, Agerbo E, et al. A large-scale genomic investigation of susceptibility to infection and its association with mental disorders in the Danish population. Transl Psychiatry. 2019;9(1):283. doi: 10.1038/s41398-019-0622-3 31712607

13. Liu X, Nudel R, Thompson WK, Appadurai V, Schork AJ, Buil A, et al. Genetic factors underlying the bidirectional relationship between autoimmune and mental disorders–findings from a Danish population-based study. Brain Behav Immun. 2020. doi: 10.1016/j.bbi.2020.06.014 32534018

14. Dehghan A, Dupuis J, Barbalic M, Bis JC, Eiriksdottir G, Lu C, et al. Meta-analysis of genome-wide association studies in> 80 000 subjects identifies multiple loci for C-reactive protein levels. Circulation. 2011;123(7):731–8. doi: 10.1161/CIRCULATIONAHA.110.948570 21300955

15. Kettunen J, Tukiainen T, Sarin AP, Ortega-Alonso A, Tikkanen E, Lyytikainen LP, et al. Genome-wide association study identifies multiple loci influencing human serum metabolite levels. Nat Genet. 2012;44(3):269–76. doi: 10.1038/ng.1073 22286219

16. Suhre K, Arnold M, Bhagwat AM, Cotton RJ, Engelke R, Raffler J, et al. Connecting genetic risk to disease end points through the human blood plasma proteome. Nat Commun. 2017;8:14357. doi: 10.1038/ncomms14357 28240269

17. Ahola-Olli AV, Wurtz P, Havulinna AS, Aalto K, Pitkanen N, Lehtimaki T, et al. Genome-wide Association Study Identifies 27 Loci Influencing Concentrations of Circulating Cytokines and Growth Factors. Am J Hum Genet. 2017;100(1):40–50. doi: 10.1016/j.ajhg.2016.11.007 27989323

18. Pedersen CB, Bybjerg-Grauholm J, Pedersen MG, Grove J, Agerbo E, Baekvad-Hansen M, et al. The iPSYCH2012 case-cohort sample: new directions for unravelling genetic and environmental architectures of severe mental disorders. Mol Psychiatry. 2018;23(1):6–14. doi: 10.1038/mp.2017.196 28924187

19. Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjalmsson BJ, Finucane HK, Salem RM, et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet. 2015;47(3):284–90. doi: 10.1038/ng.3190 25642633

20. Choi SH, Ruggiero D, Sorice R, Song C, Nutile T, Vernon Smith A, et al. Six Novel Loci Associated with Circulating VEGF Levels Identified by a Meta-analysis of Genome-Wide Association Studies. PLoS Genet. 2016;12(2):e1005874. doi: 10.1371/journal.pgen.1005874 26910538

21. Shin SY, Fauman EB, Petersen AK, Krumsiek J, Santos R, Huang J, et al. An atlas of genetic influences on human blood metabolites. Nat Genet. 2014;46(6):543–50. doi: 10.1038/ng.2982 24816252

22. Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, et al. The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell. 2016;167(5):1415–29 e19. doi: 10.1016/j.cell.2016.10.042 27863252

23. Goldstein JI, Jarskog LF, Hilliard C, Alfirevic A, Duncan L, Fourches D, et al. Clozapine-induced agranulocytosis is associated with rare HLA-DQB1 and HLA-B alleles. Nat Commun. 2014;5:4757. doi: 10.1038/ncomms5757 25187353

24. Benner C, Spencer CC, Havulinna AS, Salomaa V, Ripatti S, Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32(10):1493–501. doi: 10.1093/bioinformatics/btw018 26773131

25. Fishilevich S, Nudel R, Rappaport N, Hadar R, Plaschkes I, Iny Stein T, et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database: the journal of biological databases and curation. 2017;2017:bax028. doi: 10.1093/database/bax028 28605766

26. Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518(7539):317–30. doi: 10.1038/nature14248 25693563

27. Consortium GTEx. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348(6235):648–60. doi: 10.1126/science.1262110 25954001

28. Kavanagh D, Dwyer S, O'Donovan M, Owen M. The ENCODE project: implications for psychiatric genetics. Mol Psychiatry. 2013;18(5):540–2. doi: 10.1038/mp.2013.13 23478746

29. Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. Curr Protoc Bioinformatics. 2016;54:1.30.1–1..3. doi: 10.1002/cpbi.5 27322403

30. Gelernter J. Genetics of Complex Traits in Psychiatry. Biol Psychiatry. 2015;77(1):36–42. doi: 10.1016/j.biopsych.2014.08.005 25444161

31. Rosenblum MD, Remedios KA, Abbas AK. Mechanisms of human autoimmunity. J Clin Invest. 2015;125(6):2228–33. doi: 10.1172/JCI78088 25893595

32. Simmonds MJ, Gough SCL. Genetic insights into disease mechanisms of autoimmunity. Br Med Bull. 2005;71(1):93–113. doi: 10.1093/bmb/ldh032 15701924

33. Schizophrenia Working Group of the Psychiatric Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;(511):421–7. doi: 10.1038/nature13595 25056061

34. Bearden CE, Freimer NB. Endophenotypes for psychiatric disorders: ready for primetime? Trends Genet. 2006;22(6):306–13. doi: 10.1016/j.tig.2006.04.004 16697071

35. Iacono WG. Endophenotypes in psychiatric disease: prospects and challenges. Genome Med. 2018;10(1):11. doi: 10.1186/s13073-018-0526-5 29471866

36. Davey Smith G, Ebrahim S, Lewis S, Hansell AL, Palmer LJ, Burton PR. Genetic epidemiology and public health: hope, hype, and future prospects. Lancet (London, England). 2005;366(9495):1484–98.

37. Skogstrand K. Multiplex assays of inflammatory markers, a description of methods and discussion of precautions—Our experience through the last ten years. Methods (San Diego, Calif). 2012;56(2):204–12. doi: 10.1016/j.ymeth.2011.09.025 22001645

38. Skogstrand K, Thorsen P, Norgaard-Pedersen B, Schendel DE, Sorensen LC, Hougaard DM. Simultaneous measurement of 25 inflammatory markers and neurotrophins in neonatal dried blood spots by immunoassay with xMAP technology. Clin Chem. 2005;51(10):1854–66. doi: 10.1373/clinchem.2005.052241 16081507

39. Skogstrand K, Hagen CM, Borbye-Lorenzen N, Christiansen M, Bybjerg-Grauholm J, Baekvad-Hansen M, et al. Reduced neonatal brain-derived neurotrophic factor is associated with autism spectrum disorders. Transl Psychiatry. 2019;9(1):252. doi: 10.1038/s41398-019-0587-2 31591381

40. Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012;44(8):955–9. doi: 10.1038/ng.2354 22820512

41. Delaneau O, Marchini J, Zagury J-F. A linear complexity phasing method for thousands of genomes. Nat Methods. 2012;9(2):179–81.

42. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. doi: 10.1086/519795 17701901

43. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2(12):e190. doi: 10.1371/journal.pgen.0020190 17194218

44. Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010;26(18):2336–7. doi: 10.1093/bioinformatics/btq419 20634204

45. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17(1):122. doi: 10.1186/s13059-016-0974-4 27268795

46. Zerbino DR, Wilder SP, Johnson N, Juettemann T, Flicek PR. The ensembl regulatory build. Genome Biol. 2015;16:56. doi: 10.1186/s13059-015-0621-5 25887522

47. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74. doi: 10.1038/nature11247 22955616

48. Visel A, Minovitsky S, Dubchak I, Pennacchio LA. VISTA Enhancer Browser—a database of tissue-specific human enhancers. Nucleic Acids Res. 2007;35(Database issue):D88–92. doi: 10.1093/nar/gkl822 17130149

49. Khan A, Zhang X. dbSUPER: a database of super-enhancers in mouse and human genome. Nucleic Acids Res. 2016;44(D1):D164–71. doi: 10.1093/nar/gkv1002 26438538

50. Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507(7493):455–61. doi: 10.1038/nature12787 24670763

51. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45(6):580–5. doi: 10.1038/ng.2653 23715323

52. Mifsud B, Tavares-Cadete F, Young AN, Sugar R, Schoenfelder S, Ferreira L, et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat Genet. 2015;47(6):598–606. doi: 10.1038/ng.3286 25938943

Článek A phenome-wide association study of 26 mendelian genes reveals phenotypic expressivity of common and rare variants within the general population

Článek Formal commentary

Článek A proteomic survey of microtubule-associated proteins in a R402H TUBA1A mutant mouse

Článek Inferring causal direction between two traits in the presence of horizontal pleiotropy with GWAS summary data

Článek A complementary study approach unravels novel players in the pathoetiology of Hirschsprung disease

Článek Unique genetic signatures of local adaptation over space and time for diapause, an ecologically relevant complex trait, in Drosophila melanogaster

Článek The Bric-à-Brac BTB/POZ transcription factors are necessary in niche cells for germline stem cells establishment and homeostasis through control of BMP/DPP signaling in the Drosophila melanogaster ovary

Článek No association between SCN9A and monogenic human epilepsy disorders

Článek In vivo miRNA knockout screening identifies miR-190b as a novel tumor suppressor

Článek Suppression of retinal degeneration by two novel ERAD ubiquitin E3 ligases SORDD1/2 in Drosophila

Článek Dual genome-wide CRISPR knockout and CRISPR activation screens identify mechanisms that regulate the resistance to multiple ATR inhibitors

Článek Genetic engineering of sex chromosomes for batch cultivation of non-transgenic, sex-sorted males