Fine-Mapping the Genetic Association of the Major Histocompatibility Complex in Multiple Sclerosis: HLA and Non-HLA Effects
The major histocompatibility complex (MHC) region is strongly associated with multiple sclerosis (MS) susceptibility. HLA-DRB1*15:01 has the strongest effect, and several other alleles have been reported at different levels of validation. Using SNP data from genome-wide studies, we imputed and tested classical alleles and amino acid polymorphisms in 8 classical human leukocyte antigen (HLA) genes in 5,091 cases and 9,595 controls. We identified 11 statistically independent effects overall: 6 HLA-DRB1 and one DPB1 alleles in class II, one HLA-A and two B alleles in class I, and one signal in a region spanning from MICB to LST1. This genomic segment does not contain any HLA class I or II genes and provides robust evidence for the involvement of a non-HLA risk allele within the MHC. Interestingly, this region contains the TNF gene, the cognate ligand of the well-validated TNFRSF1A MS susceptibility gene. The classical HLA effects can be explained to some extent by polymorphic amino acid positions in the peptide-binding grooves. This study dissects the independent effects in the MHC, a critical region for MS susceptibility that harbors multiple risk alleles.
Published in the journal:
. PLoS Genet 9(11): e32767. doi:10.1371/journal.pgen.1003926
Category:
Research Article
doi:
https://doi.org/10.1371/journal.pgen.1003926
Summary
The major histocompatibility complex (MHC) region is strongly associated with multiple sclerosis (MS) susceptibility. HLA-DRB1*15:01 has the strongest effect, and several other alleles have been reported at different levels of validation. Using SNP data from genome-wide studies, we imputed and tested classical alleles and amino acid polymorphisms in 8 classical human leukocyte antigen (HLA) genes in 5,091 cases and 9,595 controls. We identified 11 statistically independent effects overall: 6 HLA-DRB1 and one DPB1 alleles in class II, one HLA-A and two B alleles in class I, and one signal in a region spanning from MICB to LST1. This genomic segment does not contain any HLA class I or II genes and provides robust evidence for the involvement of a non-HLA risk allele within the MHC. Interestingly, this region contains the TNF gene, the cognate ligand of the well-validated TNFRSF1A MS susceptibility gene. The classical HLA effects can be explained to some extent by polymorphic amino acid positions in the peptide-binding grooves. This study dissects the independent effects in the MHC, a critical region for MS susceptibility that harbors multiple risk alleles.
Introduction
Across the entire human genome, the major histocompatibility complex (MHC) on chromosome 6 makes the single largest contribution to multiple sclerosis (MS) susceptibility. The classical HLA-DRB1*15:01 allele has been documented as the strongest association to MS risk, and its role has been studied and replicated extensively [1]. Numerous other HLA alleles have been suggested to be associated with MS susceptibility, but the complex structure of the MHC has made it challenging to unequivocally pinpoint variants that play a causal role in MS [1], [2]. For example, it has been suggested that DQB1*06:02, an MHC class II allele in strong linkage disequilibrium (LD) with DRB1*15:01, either has no independent effect [3] or acts in an extended haplotype with DRB1*15:01, the DRB1*15:01—DQB1*06:02 haplotype, or the DRB1*15:01—DQA1*0102—DQB1*06:02 haplotype [4], [5]. The ambiguity and the lack of replication for many of the MHC associations can be attributed to the extended LD structure of the MHC [6], the limited number of HLA loci analyzed, and the relatively small sample size of previous studies.
Thanks to a large sample size and a novel procedure to impute classical HLA alleles from SNP data, a recent study described independent MHC effects for DRB1*15:01, *03:01 and *13:03 as well as HLA-A*02:01 and rs9277535 [7]. In the present study we sought to test not only the role of classical HLA alleles but also of potentially functional variation within the HLA genes. To this end, we imputed classical alleles as well as their corresponding amino acid sequences in 8 HLA genes in a large population of 5,091 MS cases and 9,595 healthy controls, with genome-wide data (GWAS), following a recently described imputation protocol [8]. Both the samples and the imputation method used were independent of recent efforts exploring MHC associations to MS susceptibility [7].
Results
We have successfully imputed 3,613 SNPs, 202 amino acid positions, 78 classical HLA alleles at two-digit resolution, and 99 classical HLA alleles at four-digit resolution (Figure S1). Given the number of hypotheses that are tested in this analysis, we set, a priori, p<1×10−5 as the threshold for statistical significance. This threshold accounts for 5,000 independent tests, assuming a study-wide type 1 error rate (α) of 5%. Overall, we analyzed 5,091 MS cases and 9,595 healthy controls from eight different GWAS data sets (Table 1).
Multi-allelic nature of association at HLA-DRB1
The most statistically significant variant in the univariate analysis (see Material and Methods for details) was HLA-DRB1*15:01 (odds ratio [OR] = 2.92, p = 1.4×10−234, Figure 1A). Looking at each category of variants (SNPs, two-digit HLA alleles, four-digit HLA alleles and amino acid positions), the amino acid position with the smallest p-value was position −5 in the leader peptide of DQβ1 (p = 7.6×10−231), and the most statistically significant SNP was at position 32,742,280 (OR for the A allele = 2.96, p = 5.1×10−229). An equivalent effect was observed for HLA-DQB1*06:02 (OR = 2.96, p = 5.4×10−229). We first tested whether the DRB1*15:01 effect could be explained by DQB1. Adjusting for DQB1 variants, we observed that DRB1*15:01 always had a residual effect (p<10−6). Conversely, adjusting for DRB1*15:01, the effect of DQB1 variants were accounted for (p>0.8), suggesting that DRB1*15:01 had a non-equivalent and more statistically significant effect than the DQB1 variants. Furthermore, the extended DRB1*15:01—DQB1*06:02 haplotype (p = 7.5×10−231) did not improve upon the association of DRB1*15:01 alone. Similarly, the classical DQA1*01:02 allele—that was suggested to contribute to the effect of the haplotype—was strongly associated (p = 4.8×10−178), but its effect could be entirely explained by DRB1*15:01. These observations strengthen the hypothesis that the primary MHC effect in MS is mediated by DRB1*15:01 and not by variants in the DQB1 or DQA1 loci.
The DRB1 locus (all four-digit alleles in one model) had a p-value of 4.0×10−263 in the initial analysis (Figure 1B). After adjusting for DRB1*15:01, the residual DRB1 locus effect (due to all remaining DRB1 four-digit alleles) was still statistically significant (p = 3.1×10−37), indicating the presence of multiple independent DRB1 effects. Applying a forward stepwise strategy (see Materials and Methods for details), we established statistical independence for 5 additional DRB1 alleles: *03:01, *13:03, *04:04, *04:01, and *14:01 (Table S1). After controlling for the effects of all 6 significant DRB1 alleles (including *15:01), there was no evidence for a residual signal (p = 1.5×10−05). We also applied several other variant selection approaches to test the robustness of these findings; all approaches identified the same six alleles (Table S1).
HLA-A*02:01 has an independent protective effect
Having analyzed the effects at HLA-DRB1, we tested all other variation across the MHC while correcting for the six statistically independent DRB1 alleles, namely DRB1*15:01, DRB1*03:01, DRB1*13:03, DRB1*04:04, DRB1*04:01, and DRB1*14:01. The most statistically significant variant was SNP rs2844821 near HLA-A (OR for G allele = 0.70, p = 3.2×10−29, Figure 1C). Due to LD, this SNP effect is statistically equivalent to the effect of HLA-A*02:01 (OR = 0.70, p = 7.4×10−29) and amino acid Val at position 95 in the peptide-binding groove of the HLA-A protein (OR = 0.70, p = 9.6×10−29, Figure 2). Controlling for this effect, there were no other HLA-A associations.
The DPB1 association with MS susceptibility
Controlling for the 6 DRB1 alleles and the HLA-A effect, the next most statistically significant variant was rs9277489 (OR for C = 1.31, p = 2.6×10−18). This SNP is in the intronic region of HLA-DPB1 gene and in perfect LD (r2 = 1, based on HapMap Phase II) with rs9277535 that was previously associated with MS susceptibility [7], [9]. The most statistically significant HLA allele was DPB1*03:01 (p = 3.6×10−15), but the effect of rs9277489 cannot be explained by DPB1*03:01 alone (p = 1.7×10−06 for rs9277489 in the presence of DPB1*03:01). The most statistically significant amino acid mapped to position 65 of HLA-DPβ1 (OR for Leu vs. Ile = 1.37, p = 3.7×10−18), which explained the effect of rs9277489 (p = 0.003 for rs9277489 in the presence of Leu65 in HLA-DPβ1). This amino acid is also located in the peptide-binding groove of HLA-DPβ1 (Figure 2). After controlling for rs9277489, there was no residual effect at the DPB1 locus (p>1.0×10−5).
A non-classical MHC association in MICB-LST1
Adjusting also for the DPB1 effect, we identified rs2516489 as the next most statistically significant variant (OR for T = 1.31, p = 6.7×10−13, Figure 1G, Figure S2B). This SNP tags a region of extended LD containing several non-classical MHC class I, class III and cytokine genes, i.e. MICB, DDX39B (BAT1), NFKBIL1, TNF, LTA, LTB, and LST1 (Figure 3). We note that this region had no substantial effect in the univariate analysis (Figure 1A, Figure S2A), but it became genome-wide significant once the DRB1*15:01 effect was accounted for (Table S2). There was no evidence of interaction either with DRB1*1501 (Table S2) or any other of the identified effects. To explore this phenomenon further, we stratified the samples according to the presence of DRB1*15:01 into carriers (heterozygous and homozygous) and non-carriers. Univariate analysis in these two strata revealed a consistent but modest effect (OR ∼1.2) for the associated SNP in both DRB1*15:01 carriers and non-carriers (Table S2, Figure S3). This phenomenon can likely be explained by Simpson's paradox, where two subgroups share the same association but the overall population shows no association (or even a reversed one) [10]. This analysis therefore returns, for the first time, robust evidence supporting the role of non-HLA genes within the MHC.
To explore any functional consequences of the SNPs in the MICB-LST1 region we tested these SNPs for cis-eQTL (expression quantitative trait loci) effects in peripheral blood mononuclear cells (PBMCs) of 213 MS subjects [11] as well as CD4+ T cells and CD14+ monocytes of 211 healthy controls (Table S3). None of the associated SNPs had a strong cis-eQTL effect (p>1×10−5): the strongest effect in this region is the relation of rs2516489 to LST1 expression (p = 1.91×10−5) in the CD4+ T cells of healthy individuals. The next strongest effect also involved rs2516489 but was seen in relation to HCG18 (p = 3.19×10−5) in the PBMCs of MS subjects. None of the SNPs had a statistically significant cis-eQTL effect on any of the class I or II classical HLA genes (Table S3). Leveraging the publicly available Encyclopedia of DNA Elements (ENCODE) [12] and NIH Epigenomics Roadmap [13] for immune cells and cell lines it is evident that the region has an abundance of functional elements (Figure 3). Of specific interest is the non-coding naturally occurring read-through transcription between the neighboring ATP6V1G2 (ATPase, H+ transporting, lysosomal 13 kDa, V1 subunit G2) and DDX39B (DEAD box polypeptide 39B) genes. Two SNPs, rs2523512 and rs2251824, tag this element that has a strong signal in the DNase hypersensitivity assay in all immune cell types, suggesting that it is an active cis-regulatory region. The histone markers for promoters, enhancers and active elongation also support these data, while this region is identified as an active transcription start site using chromatin states [14]. Other candidates are the TNF and LTB genes. Rs2516489, the SNP with the best (but not statistically significant) cis-eQTL effects, lies within a region of heterochromatin, with no indication of regulatory potential in the available data.
Independent HLA-B effects
Adjusting for 6 classical DRB1 alleles, HLA-A*02:01, rs9277489 (HLA-DPB1 effect) and rs2516489, we observed another novel signal emerging from the HLA-B locus (p = 7.9×10−11). The most statistically significant variants were HLA-B*37, HLA-B*37:01, amino acid Ser at position 99 in HLA-B (Figure 2) and a SNP in position 31,431,006 (hg18) (Figure 1I, J). All of these variants had statistically equivalent effects (OR = 1.75, p = 2.2×10−08). Accounting for the effect of HLA-B*37:01, no other variant in HLA-B exceeded our a priori defined threshold, although the residual effect at the HLA-B locus due to all remaining classical HLA-B alleles was still statistically significant in our analysis (p = 6.5×10−06, Figure 1L). This residual association could be accounted for by HLA-B*38:01 (OR = 0.55; p = 4.1×10−05). After adding HLA-B*38:01 to the model, there was no longer evidence for a residual effect of classical HLA-B alleles (p>0.002) or elsewhere across the MHC. No amino acid position in HLA-B could explain the HLA-B*38:01 effect.
Amino acid residues in DRβ1
Next, we set out to assess whether a specific set of amino acids within the HLA-DR molecule could explain the collective effect of the six classical DRB1 alleles identified above. To this end, we tested each polymorphic amino acid position using an omnibus test (a regression model with all but one amino acids carried by a given position), adding all amino acids (but one) of the most statistically significant position to the model in a forward stepwise fashion. The most significant amino acid position in DRβ1 mapped to position 71 (p = 1.2×10−227, Figure S4A), which carries 4 possible alleles: Ala, Arg, Glu, and Lys. Controlling for the alleles at position 71 (df = 3), there was still a strong residual signal for DRB1*15:01 (p = 5.8×10−13), indicating that amino acid position 71 alone does not explain the DRB1*15:01 effect. Adjusting for the alleles at position 71, position 74 was the next most statistically significant (p = 1.2×10−16, Figure S4B). This position harbors five possible alleles: Arg, Leu, Glu, Ala and Gln. Controlling for positions 71 and 74, position 57 (with four alleles: Asp, Ser, Val or Ala) was the next most statistically significant association (p = 4.9×10−11, Figure S4C). Controlling for positions 71, 74 and 57, we found position 86 as the most statistically significant association (OR = 1.35 for Val vs. Gly, p = 1.0×10−06, Figure S4D). After controlling for these four positions, no other amino acid position exceeded our significance threshold (Figure S4E), although HLA-DRB1*15:01 still showed a residual association signal (p = 10−05). The model with the four DRβ1 amino acid positions could explain the data better than a model with only DRB1*15:01 (p = 2.6×10−26 in favor of the DRβ1 amino acid positions), but it was slightly worse than the model with the six DRB1 alleles (p = 0.001 in favor of the 6 DRB1 alleles). All four amino acid positions reside in the peptide-binding groove of the HLA-DR molecule (Figure 2; Table S4 lists the correspondence between the amino acids at these positions and the six associated classical DRB1 alleles).
Variance explained
Integrating all of the results, HLA-DRB1*15:01 accounted for 10% of the phenotypic variance in the data, whereas all 6 independent HLA-DRB1 alleles explained 11.6%. A model with all identified statistically independent effects (HLA-DRB1*15:01, HLA-DRB1*03:01, HLA-DRB1*13:03, HLA-DRB1*04:04, HLA-DRB1*04:01, HLA-DRB1*14:01, HLA-A*02:01, rs9277489/Leu65 in HLA-DPβ1, rs2516489, HLA-B*37:01, and HLA-B*38:01) accounted for 14.2% of the total variance in MS susceptibility.
Discussion
We have imputed classical alleles of HLA genes, their corresponding amino acids and SNPs across the MHC, and tested all variants for association in a large case-control collection. Our analysis corroborates the effects of DRB1 alleles other than the well-known DRB1*15:01 association. Classical alleles DRB1*03:01, *13:03, *04:04, *04:01, and *14:01 display robust, independent associations in our data. The DQB1 and DQA1 genes have been suggested to form extended haplotypes with DRB1 alleles, mostly *15:01 [4]. In our hands, the effect of DQB1*06:02 does not explain the effect of DRB1*15:01. Furthermore, the DRB1*15:01—DQB1*06:02 haplotype does not appear to explain the data as well as the effect of DRB1*15:01 alone. Based on these results, DRB1*15:01 and the remaining DRB1 alleles are better candidates than DQB1 variants for a causal role in MS susceptibility, a hypothesis that agrees with the MHC analysis of MS subjects with African origin [3]. We note that this interpretation counters evidence in favor of DQB1 from certain murine models that capture elements of human inflammatory demyelination by triggering experimental autoimmune encephalomyelitis induced with myelin-associated oligodendrocytic basic protein [15] or proteolipid protein [16].
A number of studies have highlighted the importance of class I HLA alleles in MS susceptibility, with HLA-A*02:01 being the most prominent allele [17]–[20]. Here, we replicated the HLA-A*02:01 association and attributed it to an amino acid polymorphism at position 95 in the peptide-binding groove of the HLA-A molecule. We also replicated the recently proposed DPB1*03:01 association, and identified a more statistically significant effect at amino acid position 65 in the peptide binding groove of HLA-DPβ1 [7], [9]. Although our study has overlapping samples with the first study to identify an independent HLA-DPB1 effect [9], these account for only 24% of the present sample set. The evidence of an HLA-DPB1 effect is strengthened by the fact that the second study reporting such a signal [7] has no overlapping samples with our study. Furthermore, we confirmed the presence of statistically independent HLA-B effects [21], [22]. Our analysis fine-mapped these to B*37:01 and B*38:01. Of these, B*37:01 can be explained by amino acid Ser99 of the HLA-B protein, which is also in this molecule's peptide-binding groove. The HLA-C locus demonstrated no convincing evidence for a statistically independent effect, suggesting that previous results may have tagged untested HLA-A or HLA-B effects across the class I region [23]. Although some of the above associations could be explained by specific amino acid polymorphisms in the corresponding HLA proteins, the picture at HLA-DRB1 however appears to be more complex as there was no single model based on amino acids that could explain the entire locus effect (including the specific effect due to DRB1*15:01). At this stage, our conservative interpretation of these results is that the implicated amino acids allow new hypotheses to be formulated for future functional studies.
An interesting finding in our analysis was the association of the region spanning from MICB to LST1, which contains several important class I, class III and cytokine-related genes. Although the identified SNPs were not significant in the initial (univariate) analysis, we established that these reached significance after adjusting for the strong DRB1*15:01 effect. One small study previously examined MICB along with DRB1*15 and had found evidence for an independent association [24]. Another study reported that variation in TNF can modify the effect of DRB1*15:01 [25]. We did not obtain evidence for statistical interaction between this locus and the other MHC variants, indicating that the MHC susceptibility variants we have catalogued likely act independently and additively in terms of MS susceptibility. Overall, we offer robust evidence for the role of a specific MS susceptibility haplotype in this region of the MHC. This region harbors evidence for association with several other diseases, e.g. Crohn's disease and ulcerative colitis [26], rheumatoid arthritis [27], Sjogren's syndrome [28], and hepatitis C virus-associated dilated cardiomyopathy [29]. However, the identity of the causal gene(s) within this associated region remains unclear at this time, but it is intriguing that three of the genes (TNF, LTA and LTB) are ligands for one of the validated MS susceptibility genes, TNFRSF1A [30]. We did not observe any evidence of statistical interaction (p>0.5) with this non-MHC locus in our data. Our preliminary analysis using cis-eQTL data in healthy individuals and MS subjects as well as the publicly available genomic data from the ENCODE and NIH Epigenomics Roadmap did not identify a single variant/gene as the likely causal one. From this information it seems that several genes have functional potential, but more detailed functional studies will be needed to unravel the causal variants and genes.
Leveraging genome-wide genotype data, the collection of analyses presented here provides a well-powered investigation of thousands of genotyped and imputed SNPs, classical alleles of 8 class I and II HLA genes and amino acid sequence variation of these HLA proteins. The combination of the large sample size with additional variation types allowed us to present an enhanced dissection of the critical role of the MHC in MS susceptibility. Our results highlight a possible role for certain residues in the peptide-binding groove of HLA molecules associated with peptide antigen recognition. In HLA-DRβ1 we identified a set of four amino acids in positions 71, 74, 57 and 86 that capture most (but not all) of the DRB1 association. Of these, Val86 has been associated previously with MS [31]–[33], and this residue appears to be important for the presentation of peptides from a putative target antigen in MS, myelin basic protein [34], and for the stability of the DRαβ dimer [35]. Another study suggested an association at position 60 [36] and another one at position 13 [37], although these were not replicated in the present study. Interestingly, the HLA-DRβ1 amino acids in positions 71 and 74 were recently also associated with susceptibility to rheumatoid arthritis [38]. Overall, consistent with the known biology of MS, it appears that disease-associated variants in HLA-DRB1 primarily influence the structural characteristics of the peptide-binding groove and presumably lead to alterations of the T cell repertoire that enhance the likelihood of an inflammatory demyelinating process. However, the MHC also harbors at least one other risk allele that does not directly affect an antigen-presenting molecule: the robust evidence supporting a risk haplotype in the vicinity of MICB will have a different mechanism, one that is likely to affect the function of one or perhaps several cytokines.
This study displays an effective strategy for in-depth characterization of this complex region of the human genome. Increasing study sample sizes and more complete reference panels are likely to continue to provide a more detailed perspective on the architecture of genetic susceptibility in this region. The identified amino acid residues may help prioritize the identification of binding peptides and investigations of other potential roles that these susceptibility alleles might have in the biology of MS susceptibility aside from antigen presentation.
Materials and Methods
Samples
We used data from 8 genome-wide association studies (GWAS) of European ancestry (Table 1): (a) three GWAS of the GeneMSA [30], [39] with samples from the Netherlands (GeneMSA DU), Switzerland (GeneMSA SW), and the United States (GeneMSA US); (b) an early GWAS from the IMSGC [30], [39], [40] with samples from the United States (ISMGC US) and the United Kingdom (IMSGC UK), that was collapsed in one stratum removing the UK cases; (c) a GWAS with cases from the Brigham and Women's Hospital and controls from the MIGEN study (BWH) [30], [39]; (d) the Australia and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene) [41]; (e) an unpublished GWAS set from Erasmus Medical Center in Rotterdam, the Netherlands; and (f) an unpublished GWAS collection from the Kaiser Permanente MS Research Program (Kaiser Permanente). All the above GWAS data sets were filtered with the same quality control criteria as part of an ongoing meta-analysis of Multiple Sclerosis GWAS. In each of these data sets we performed principal components analysis (PCA) to identify population outliers and to calculate covariates to control for population stratification between cases and controls.
Imputation of classical HLA type I and II alleles and respective amino acid sequences
From each GWAS we extracted SNPs within the extended MHC region (chr6:29,299,390 to 33,883,424; hg18) to impute classical alleles for class I HLA genes (HLA-A, HLA-B, and HLA-C) and class II HLA genes (HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1, and HLA-DRB1), their corresponding amino acid sequences and SNPs not captured in the genotypic platforms used. The imputation was performed with the software BEAGLE [42] using a collection of 2,767 individuals of the Type 1 Diabetes Genetics Consortium (T1DGC) with 4-digit classical allele genotyping for the above HLA genes as the reference panel. This method and reference panel have been used for fine-mapping MHC associations in HIV control [8] and seropositive rheumatoid arthritis [38]. Cases and controls from each GWAS dataset were imputed together. All variants in the reference panel were coded as biallelic markers (presence vs. absence), allowing us to use BEAGLE for the imputation. Post-imputation we excluded variants with minor allele frequency less than 1% from the analysis. Table S5 lists the imputation quality for the identified variants.
Statistical analysis
We analyzed each variant using a logistic regression model, assuming alleles have an additive effect on the log-odds scale. We also assumed the genetic effects were fixed across all eight GWAS. In each model we included the top 5 principal components to control for within-GWAS population stratification and 7 dummy variables to account for between-GWAS specific effects. Throughout the text we refer to such a model as univariate (Mu), even if several covariates were included in the model, reflecting the fact that only one MHC-specific variant is included in the model. This is the representation of the univariate model:
Mu, Univariate logistic regression model
where y is the log(odds) for the case-control status, β0 is the logistic regression intercept and βi,j is the log-additive effect for the allele j of the variant i with p alleles. In this paper, the term variant is used for any type of SNP (biallelic, triallelic, etc), two-digit HLA allele, four-digit HLA allele and amino acid position. In any case we included p-1 alleles, with the one excluded being the reference allele. Where possible we tried to select the most frequent variant in the controls as the reference allele. The five included principal components are represented in the model as l and the last block in the model represents the dummy variables included for the n studies (n-1 parameters added in the model).
To calculate an omnibus p-value for the variant, regardless of the number of alleles included in the univariate model, we used using a log-likelihood ratio test (2) comparing the likelihood L0 of the null model (3) against the likelihood L1 of the fitted model:
Log-likelihood ratio test
where D is the log-likelihood test value, also known as deviance. D follows an approximate chi-square distribution with k degrees of freedom, where k is the difference of the regressed parameters between the two models. Representation of the null model:
M0, Null logistic regression model
Besides testing variants for association, i.e. SNPs, HLA alleles and amino acids, we also fitted models that estimated the overall effect of the each of the eight HLA genes. We did so, by fitting all respective four-digit alleles of a given HLA gene in the same model. The respective p-values reflect the overall significance of the gene.
Framework for identification of statistically independent effects
In order to identify the statistically independent effects, we first tested all variants under a univariate logistic regression model and ranked them based on the p-value of the log-likelihood test. Next, in a forward stepwise fashion, we included in the logistic regression model the most statistically significant variant as a covariate, analyzed all remaining variants and ranked them based on the new p-value of the respective log-likelihood test. The models that included at least one variant as covariate are referred to as conditional throughout the text. In each iteration the null model used in the log-likelihood test was the original null model (3) with the variants that were used as covariates. We repeated the same steps until no variant or no HLA gene reached the level of statistical significance, which we a priori set to be 10−5. This statistical significance threshold accounts of 5,000 independent tests using Bonferroni correction. Although most of the variants analyzed are correlated, we chose this threshold to account also for the multiple stepwise fitted models. If no variant reached the level of significance but an HLA gene did, we kept adding variants in the overall model until the HLA gene p-value was larger than 10−5.
To compare the effects of two (or more variants), e.g. A and B, we fitted the following models: MA model with variant A, MB model with variant B, and MAB model with both variants A and B. All three model included the same other covariates. Then we used the log-likelihood test to compare MAB vs. MB and MAB vs. MA. These two comparisons represent the effects of variants A and B, respectively, in the presence of the other variant, i.e. B and A. For these comparisons we used the nominal (α = 0.05) level of statistical significance.
Statistically independent effects in the DRB1 locus
After adjusting for the most statistically significant variant, DRB1*15:01, the residual effect of the DRB1 locus, i.e. the effect of all alleles besides *15:01, was still the most statistically significant of any of the remaining variants. This led us to the hypothesis that several other DRB1 alleles could explain the overall DRB1 locus effect, already conditioning on DRB1*15:01. To identify such effects inside the DRB1 locus, we applied the above forward stepwise logistic regression approach to the four-digit DRB1 alleles. , To test the robustness of the results from the forward stepwise regression, we also applied four other statistical methods for variant selection: i) lasso, [43] ii) elastic net, [44] iii) least angle regression, [45] and iv) forward Stagewise regression. [46] For the lasso and elastic net we selected the largest value of lambda (l1) after 10-fold cross-validation, such that error was within 1 standard error of the minimum mean cross-validated error. In the respective results section, we illustrate that all methods reached the same conclusion independently.
DRB1*15:01:DQB1*06:02 extended haplotypes (diplotypes)
It has been proposed that extended DRB1*15:01–DQB1*06:02 haplotypes confer the risk for MS rather than individual HLA alleles. To test this hypothesis, we used the post-imputation phased data to estimate the DRB1*15:01–DQB1*06:02 diplotypes. Then we fitted a logistic regression that estimated the effect of the diplotype under a per-allelic model. Since this approach used phased data, rather than post-imputation probabilities, the imputation uncertainty is not properly accounted for. Thus, we expect the respective p-values to be slightly inflated.
Functional analysis of the MICB-LST1 region
To investigate the functional potential of the MICB-LST1 region we queried:
-
in-house cis-eQTL (expression quantitative trait loci) in PBMCs of 213 MS subjects [11] and CD4+ T cells and CD14+ monocytes of 211 healthy individuals. The PBMCs gene expression levels were quantified with mRNA derived from of 213 subjects of European ancestry with relapsing remitting (RR) multiple sclerosis (MS) via an Affymetrix Human Genome U133 Plus 2.0 Array. The expression levels were adjusted for confounding factors, such as subject's use of immunomodulatory drugs, age, gender, and batch effects via principle components analysis. Associations between SNP genotypes and adjusted expression residual traits were conducted by Spearman rank correlation (SRC). For the cis analysis, we considered only SNPs within a 2 Mbps window from the transcript start site (TSS) of genes. Furthermore, we also explored any possible effect to the expression of class I and II HLA classical genes, regardless of their physical distance. Significance of the nominal p-values was determined by comparing the distribution of the most significant p values generated by permuting expression phenotypes 10,000 times independently for each gene. We call a cis-eQTL significant if the nominal association p-value is greater than the 0.05 tail of the minimal p-value distribution resulting from the permuted associations, which corresponds to a p-value of 1×10−05. Similar methods were used to evaluate the cis-regulatory effects in CD4+ T lymphocytes and CD14+ monocytes data sets consisting of 211 healthy individuals of European ancestry. These analyses were conducted under the auspices of a protocol approved by the institutional review board of Partners Healthcare.
-
publicly available data from the ENOCDE [12] and NIH Epigenomics Roadmap [13]. Specifically we perused data for functional potential from CD4 memory primary cells (H3K4me3, H3K27ac, H3K36me3, chromatin states), CD4 naïve primary cells (H3K4me3, H3K27ac, H3K36me3, chromatin states), CD8 memory primary cells (H3K4me3, H3K27ac, H3K36me3, chromatin states), CD8 naïve primary cells (H3K4me3, H3K27ac, H3K36me3, chromatin states), CD4 primary cells (DNase hypersensitivity sites), CD8 primary cells (DNase hypersensitivity sites), Treg primary cells (H3K4me3, H3K36me3, DNase hypersensitivity sites), Th1 (DNase hypersensitivity sites), Th2 (DNase hypersensitivity sites), Th17 (DNase hypersensitivity sites) and GM12878 cell line (B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus; H3K4me3, H3K27ac, H3K36me3, chromatin states, DNase hypersensitivity sites).
Variance explained
We used Nagelkerke's pseudo-R [47] to estimate the variance explained
Nagelkerke's pseudo-R2
where L0 and L1 are the likelihoods of the null model and fitted model respectively, and N is the number of individuals.
Software
We used PLINK for the initial analysis of the data and to estimate minor allele frequencies and imputation quality metrics, i.e. INFO score. [48] We fitted all models in R using the glm function and package lars and glmnet.
Ethics statement
This investigation has been approved by the Institutional Review Board of Partners Healthcare; the reference number is 2002p000434.
Supporting Information
Zdroje
1. RamagopalanSV, KnightJC, EbersGC (2009) Multiple sclerosis and the major histocompatibility complex. Curr Opin Neurol 22: 219–225.
2. FernandoMM, StevensCR, WalshEC, De JagerPL, GoyetteP, et al. (2008) Defining the role of the MHC in autoimmunity: a review and pooled analysis. PLoS Genet 4: e1000024.
3. OksenbergJR, BarcellosLF, CreeBA, BaranziniSE, BugawanTL, et al. (2004) Mapping multiple sclerosis susceptibility to the HLA-DR locus in African Americans. Am J Hum Genet 74: 160–167.
4. StewartGJ, TeutschSM, CastleM, HeardRN, BennettsBH (1997) HLA-DR, -DQA1 and -DQB1 associations in Australian multiple sclerosis patients. Eur J Immunogenet 24: 81–92.
5. MarrosuMG, MurruMR, CostaG, MurruR, MuntoniF, et al. (1998) DRB1-DQA1-DQB1 loci and multiple sclerosis predisposition in the Sardinian population. Hum Mol Genet 7: 1235–1237.
6. de BakkerPI, McVeanG, SabetiPC, MirettiMM, GreenT, et al. (2006) A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat Genet 38: 1166–1172.
7. SawcerS, HellenthalG, PirinenM, SpencerCC, PatsopoulosNA, et al. (2011) Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476: 214–219.
8. PereyraF, JiaX, McLarenPJ, TelentiA, de BakkerPI, et al. (2010) The major genetic determinants of HIV-1 control affect HLA class I peptide presentation. Science 330: 1551–1557.
9. FieldJ, BrowningSR, JohnsonLJ, DanoyP, VarneyMD, et al. (2010) A polymorphism in the HLA-DPB1 gene is associated with susceptibility to multiple sclerosis. PLoS One 5: e13454.
10. SimpsonEH (1951) The Interpretation of Interaction in Contingency Tables. J R Stat Soc Series B 13: 238–241.
11. RajT, KuchrooM, ReplogleJM, RaychaudhuriS, StrangerBE, et al. (2013) Common risk alleles for inflammatory diseases are targets of recent positive selection. Am J Hum Genet 92: 517–529.
12. BernsteinBE, BirneyE, DunhamI, GreenED, GunterC, et al. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74.
13. Roadmap Epigenomics Project. http://www.roadmapepigenomics.org.
14. ErnstJ, KellisM (2012) ChromHMM: automating chromatin-state discovery and characterization. Nat Methods 9: 215–216.
15. KaushanskyN, AltmannDM, AscoughS, DavidCS, LassmannH, et al. (2009) HLA-DQB1*0602 determines disease susceptibility in a new “humanized” multiple sclerosis model in HLA-DR15 (DRB1*1501;DQB1*0602) transgenic mice. J Immunol 183: 3531–3541.
16. KaushanskyN, AltmannDM, DavidCS, LassmannH, Ben-NunA (2012) DQB1*0602 rather than DRB1*1501 confers susceptibility to multiple sclerosis-like disease induced by proteolipid protein (PLP). J Neuroinflammation 9: 29.
17. BrynedalB, DuvefeltK, JonasdottirG, RoosIM, AkessonE, et al. (2007) HLA-A confers an HLA-DRB1 independent influence on the risk of multiple sclerosis. PLoS One 2: e664.
18. Fogdell-HahnA, LigersA, GronningM, HillertJ, OlerupO (2000) Multiple sclerosis: a modifying influence of HLA class I genes in an HLA class II associated autoimmune disease. Tissue Antigens 55: 140–148.
19. BergamaschiL, BanM, BarizzoneN, LeoneM, FerranteD, et al. (2011) Association of HLA class I markers with multiple sclerosis in the Italian and UK population: evidence of two independent protective effects. J Med Genet 48: 485–492.
20. CreeBA, RiouxJD, McCauleyJL, GourraudPA, GoyetteP, et al. (2010) A major histocompatibility Class I locus contributes to multiple sclerosis susceptibility independently from HLA-DRB1*15:01. PLoS One 5: e11296.
21. HealyBC, LiguoriM, TranD, ChitnisT, GlanzB, et al. (2010) HLA B*44: protective effects in MS susceptibility and MRI outcome measures. Neurology 75: 634–640.
22. RiouxJD, GoyetteP, VyseTJ, HammarstromL, FernandoMM, et al. (2009) Mapping of multiple susceptibility variants within the MHC region for 7 immune-mediated diseases. Proc Natl Acad Sci U S A 106: 18680–18685.
23. YeoTW, De JagerPL, GregorySG, BarcellosLF, WaltonA, et al. (2007) A second major histocompatibility complex susceptibility locus for multiple sclerosis. Ann Neurol 61: 228–236.
24. Fernandez-MoreraJL, Rodriguez-RoderoS, TunonA, Martinez-BorraJ, Vidal-CastineiraJR, et al. (2008) Genetic influence of the nonclassical major histocompatibility complex class I molecule MICB in multiple sclerosis susceptibility. Tissue Antigens 72: 54–59.
25. AllcockRJ, de la ConchaEG, Fernandez-ArqueroM, VigilP, ConejeroL, et al. (1999) Susceptibility to multiple sclerosis mediated by HLA-DRB1 is influenced by a second gene telomeric of the TNF cluster. Hum Immunol 60: 1266–1273.
26. GlasJ, MartinK, BrunnlerG, KoppR, FolwacznyC, et al. (2001) MICA, MICB and C1_4_1 polymorphism in Crohn's disease and ulcerative colitis. Tissue Antigens 58: 243–249.
27. Lopez-ArbesuR, Ballina-GarciaFJ, Alperi-LopezM, Lopez-SotoA, Rodriguez-RoderoS, et al. (2007) MHC class I chain-related gene B (MICB) is associated with rheumatoid arthritis susceptibility. Rheumatology (Oxford) 46: 426–430.
28. BolstadAI, Le HellardS, KristjansdottirG, VasaitisL, KvarnstromM, et al. (2012) Association between genetic variants in the tumour necrosis factor/lymphotoxin alpha/lymphotoxin beta locus and primary Sjogren's syndrome in Scandinavian samples. Ann Rheum Dis 71: 981–988.
29. ShichiD, KikkawaEF, OtaM, KatsuyamaY, KimuraA, et al. (2005) The haplotype block, NFKBIL1-ATP6V1G2-BAT1-MICB-MICA, within the class III-class I boundary region of the human major histocompatibility complex may control susceptibility to hepatitis C virus-associated dilated cardiomyopathy. Tissue Antigens 66: 200–208.
30. De JagerPL, JiaX, WangJ, de BakkerPI, OttoboniL, et al. (2009) Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat Genet 41: 776–782.
31. AllenM, Sandberg-WollheimM, SjogrenK, ErlichHA, PettersonU, et al. (1994) Association of susceptibility to multiple sclerosis in Sweden with HLA class II DRB1 and DQB1 alleles. Hum Immunol 39: 41–48.
32. Saruhan-DireskeneliG, EsinS, Baykan-KurtB, OrnekI, VaughanR, et al. (1997) HLA-DR and -DQ associations with multiple sclerosis in Turkey. Hum Immunol 55: 59–65.
33. TeutschSM, BennettsBH, BuhlerMM, HeardRN, StewartGJ (1999) The DRB1 Val86/Val86 genotype associates with multiple sclerosis in Australian patients. Hum Immunol 60: 715–722.
34. WucherpfennigKW, SetteA, SouthwoodS, OseroffC, MatsuiM, et al. (1994) Structural requirements for binding of an immunodominant myelin basic protein peptide to DR2 isotypes and for its recognition by human T cell clones. J Exp Med 179: 279–290.
35. VerreckFA, TermijtelenA, KoningF (1993) HLA-DR beta chain residue 86 controls DR alpha beta dimer stability. Eur J Immunol 23: 1346–1350.
36. BarcellosLF, SawcerS, RamsayPP, BaranziniSE, ThomsonG, et al. (2006) Heterogeneity at the HLA-DRB1 locus and risk for multiple sclerosis. Hum Mol Genet 15: 2813–2824.
37. RamagopalanSV, McMahonR, DymentDA, SadovnickAD, EbersGC, et al. (2009) An extension to a statistical approach for family based association studies provides insights into genetic risk factors for multiple sclerosis in the HLA-DRB1 gene. BMC Med Genet 10: 10.
38. RaychaudhuriS, SandorC, StahlEA, FreudenbergJ, LeeHS, et al. (2012) Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat Genet 44: 291–296.
39. PatsopoulosNA, EspositoF, ReischlJ, LehrS, BauerD, et al. (2011) Genome-wide meta-analysis identifies novel multiple sclerosis susceptibility loci. Ann Neurol 70: 897–912.
40. HaflerDA, CompstonA, SawcerS, LanderES, DalyMJ, et al. (2007) Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med 357: 851–862.
41. (ANZgene) AaNZMSGC (2009) Genome-wide association study identifies new multiple sclerosis susceptibility loci on chromosomes 12 and 20. Nat Genet 41: 824–828.
42. BrowningBL, BrowningSR (2009) A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84: 210–223.
43. TibshiraniR (1996) Regression shrinkage and selection via the lasso. J Royal Statist Soc B 58: 267–288.
44. ZouHH (2005) T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol 67: 301–320.
45. EfronBHT (2012) JohnstoneI (2012) TibshiraniR (2012) Least angle regression. Ann Stat 32: 407–499.
46. HastieTTJ (2007) TibshiraniR (2007) WaltherG (2007) Forward stagewise regression and the monotone lasso. Electron J Stat 1: 1–29.
47. NagelkerkeN (1991) A note on a general definition of the coefficient of determination. Biometrika 78: 691–692.
48. PurcellS, NealeB, Todd-BrownK, ThomasL, FerreiraMA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575.
49. PettersenEF, GoddardTD, HuangCC, CouchGS, GreenblattDM, et al. (2004) UCSF Chimera–a visualization system for exploratory research and analysis. J Comput Chem 25: 1605–1612.
Štítky
Genetika Reprodukční medicínaČlánek vyšel v časopise
PLOS Genetics
2013 Číslo 11
- Primární hyperoxalurie – aktuální možnosti diagnostiky a léčby
- Srdeční frekvence embrya může být faktorem užitečným v předpovídání výsledku IVF
- Akutní intermitentní porfyrie
- Vztah užívání alkoholu a mužské fertility
- Šanci na úspěšný průběh těhotenství snižují nevhodné hladiny progesteronu vznikající při umělém oplodnění
Nejčtenější v tomto čísle
- Genetic and Functional Studies Implicate Synaptic Overgrowth and Ring Gland cAMP/PKA Signaling Defects in the Neurofibromatosis-1 Growth Deficiency
- RNA∶DNA Hybrids Initiate Quasi-Palindrome-Associated Mutations in Highly Transcribed Yeast DNA
- The Light Skin Allele of in South Asians and Europeans Shares Identity by Descent
- Roles of XRCC2, RAD51B and RAD51D in RAD51-Independent SSA Recombination