The Hourglass and the Early Conservation Models—Co-Existing Patterns of Developmental Constraints in Vertebrates
Developmental constraints have been postulated to limit the space of feasible phenotypes and thus shape animal evolution. These constraints have been suggested to be the strongest during either early or mid-embryogenesis, which corresponds to the early conservation model or the hourglass model, respectively. Conflicting results have been reported, but in recent studies of animal transcriptomes the hourglass model has been favored. Studies usually report descriptive statistics calculated for all genes over all developmental time points. This introduces dependencies between the sets of compared genes and may lead to biased results. Here we overcome this problem using an alternative modular analysis. We used the Iterative Signature Algorithm to identify distinct modules of genes co-expressed specifically in consecutive stages of zebrafish development. We then performed a detailed comparison of several gene properties between modules, allowing for a less biased and more powerful analysis. Notably, our analysis corroborated the hourglass pattern at the regulatory level, with sequences of regulatory regions being most conserved for genes expressed in mid-development but not at the level of gene sequence, age, or expression, in contrast to some previous studies. The early conservation model was supported with gene duplication and birth that were the most rare for genes expressed in early development. Finally, for all gene properties, we observed the least conservation for genes expressed in late development or adult, consistent with both models. Overall, with the modular approach, we showed that different levels of molecular evolution follow different patterns of developmental constraints. Thus both models are valid, but with respect to different genomic features.
Published in the journal:
. PLoS Genet 9(4): e32767. doi:10.1371/journal.pgen.1003476
Category:
Research Article
doi:
https://doi.org/10.1371/journal.pgen.1003476
Summary
Developmental constraints have been postulated to limit the space of feasible phenotypes and thus shape animal evolution. These constraints have been suggested to be the strongest during either early or mid-embryogenesis, which corresponds to the early conservation model or the hourglass model, respectively. Conflicting results have been reported, but in recent studies of animal transcriptomes the hourglass model has been favored. Studies usually report descriptive statistics calculated for all genes over all developmental time points. This introduces dependencies between the sets of compared genes and may lead to biased results. Here we overcome this problem using an alternative modular analysis. We used the Iterative Signature Algorithm to identify distinct modules of genes co-expressed specifically in consecutive stages of zebrafish development. We then performed a detailed comparison of several gene properties between modules, allowing for a less biased and more powerful analysis. Notably, our analysis corroborated the hourglass pattern at the regulatory level, with sequences of regulatory regions being most conserved for genes expressed in mid-development but not at the level of gene sequence, age, or expression, in contrast to some previous studies. The early conservation model was supported with gene duplication and birth that were the most rare for genes expressed in early development. Finally, for all gene properties, we observed the least conservation for genes expressed in late development or adult, consistent with both models. Overall, with the modular approach, we showed that different levels of molecular evolution follow different patterns of developmental constraints. Thus both models are valid, but with respect to different genomic features.
Introduction
Developmental constraints have been suggested to play an important role in shaping the evolution of embryonic development in animals. Briefly, the concept of developmental constraints assumes that the scope of developmental mechanisms limits the set of phenotypes that may evolve. Thus, morphological similarities between embryos of different species could reflect these underlying constraints [1]. Two main models of embryonic developmental constraints have been put forward. The early conservation model predicts that the highest developmental constraints occur at the beginning of embryogenesis. This corresponds to von Baer's third law [2], postulating that embryos of different species progressively diverge from one another during ontogeny. However, in modern times, the highest morphological similarity between embryos of different species was observed in the phylotypic stage (i.e., mid-embryogenesis) [3]–[5]. Consequently, Duboule [6] and Raff [7] proposed the so-called hourglass model, which has since become widely accepted (see, e.g., [8], [9]). It predicts the highest developmental constraints during mid-embryogenesis.
At the genomic level, the hourglass model was originally linked to the expression of Hox genes in animals [6]. More recently, the emphasis has shifted to the relation, if any, between developmental constraints and the evolution and function of the genome (reviewed in [9]). Different studies have reported several characteristics supporting the hourglass model in animals on the genomic level. Hazkani-Covo et al. [10] reported the highest protein sequence similarity between mouse and human for genes expressed in mid-development. In two influential papers, Domazet-Lošo and Tautz [11] reported that the genes expressed in mid-development of zebrafish are older than genes expressed early or late, while Kalinka et al. [12] showed that genes expressed in mid-development of fruit flies have the highest expression conservation. Similarly, Irie and Kuratani [13] reported the highest expression conservation between zebrafish, frog, chicken and mouse, for genes expressed in mid-development. Very recently, the hourglass model was argued to hold also for plants embryogenesis with respect to gene age and sequence conservation [14]. However, some of these results do not hold out under detailed analyses (see Box 1 and Text S1). For example, applying a standard log-transformation [15], [16] to microarray signal intensities used in [11] changes the reported pattern such that it no longer supports the hourglass model (Figure 1). Moreover, other studies have also found genetic patterns supporting an early conservation model [17], [18].
Box 1. Transcriptome Age Index
Recent results of Domazet-Lošo and Tautz [11] suggest that the oldest transcriptome set is expressed at the phylotypic stage, and that younger sets are expressed during early and late development, which support the hourglass model. To study the relationship between gene expression, ontogeny and phylogeny, the authors proposed a measure called the “transcriptome age index”, or TAI. The TAI was defined as the mean of the phylogenetic ranks (“phylostrata”) across genes, weighted by their microarray signal intensity values at each developmental stage. Note that the microarray signal intensity values used in [11] displayed a log-normal distribution and spanned from 1 to (Figure S1). Using these values to calculate TAI made the weights of phylogenetic ranks differ by five orders of magnitude between lowly and highly expressed genes. Consequently, only the most expressed genes (Figure S2), and potentially outliers (Figure S3), contributed to the hourglass pattern discovered with TAI. We found that applying a standard log-transformation to the intensity values changes the pattern, which then indicates older genes being expressed preferentially in early development (Figure 1). The use of log-transformed data for microarray intensities is generally encouraged [15], [16] because it keeps the biological signal, while removing dependency between variance and intensity of the analyzed signals. We present a more detailed re-analysis of the study of Domazet-Lošo and Tautz [11] in Text S1 (Figures S1, S2, S3, S4, S5, S6). We also discuss in Text S1 the study of Quint et al. [14] that reported an hourglass pattern in plant embryogenesis using the same methodology (Figures S7 and S8).
In most of the studies of developmental constraints the authors compared descriptive statistics of all genes across all developmental time-points (e.g., median expression [17], weighted mean age [11], mean expression correlation [13]). Such an approach introduces dependencies between the sets of genes which are compared, and consequently can produce results biased by genes expressed at many time-points. For example, housekeeping genes contribute to the average gene expression at all time points, and hence dilute trends. To overcome this essential problem, we have used a modularization approach, which we applied to the recently published transcriptome data of zebrafish development [11]. We decomposed the genes into independent sets, i.e., modules, that contained genes overexpressed solely in one of seven developmental stages: cleavage/blastula, gastrula, segmentation, pharyngula, larva, juvenile and adult. This decomposition allowed us to compare only sets of genes that have specific functions during embryonic development. For each of the seven modules, we studied five properties of its genes: 1) gene sequence conservation, 2) gene age, 3) gene expression conservation, 4) gene orthology relationships, and 5) regulatory elements conservation.
Here, we show that different levels of molecular evolution follow different patterns of developmental constraints. First, the regulatory elements are most conserved for transcription factors expressed at mid-development, consistent with the hourglass model. Contrary to what has been reported previously [10], [11], [13], we did not detect the hourglass pattern for gene sequence, age and expression. Second, constraints on gene duplication and on new gene introduction are the strongest in early development, supporting the early conservation model (consistent with [17]). Finally, all gene properties displayed the least conservation in late development and adult, which is in agreement with both models of developmental constraints.
Results
Modules
Our goal was to analyze the developmental constraints acting on different gene properties. To this end we identified and analyzed groups of genes co-expressed during distinct developmental stages. We applied the Iterative Signature Algorithm (ISA) [19], [20] to the zebrafish expression data published by Domazet-Lošo and Tautz [11], which measured the dynamics of the transcriptome during development with a resolution of 60 time points. The ISA is a modularization algorithm that finds genes with similar expression profiles and groups them into so-called transcription modules. In order to detect modules of genes with specific expression during the zebrafish development, we initialized the ISA with seven idealized expression profiles that corresponded to successive developmental stages (see Text S1 and Figure S10).
We obtained seven modules, each containing genes overexpressed during one of the following developmental stages: cleavage/blastula, gastrula, segmentation, pharyngula, larva, juvenile and adult (Figure 2). Overall, the modules covered the entire development. The phylotypic stage in which the hourglass model predicts the highest evolutionary constraint corresponds to the segmentation and pharyngula modules. We will refer to these two modules as phylotypic modules. The cleavage/blastula and gastrula modules will be referred to as early modules, and larva, juvenile and adult modules as late modules.
The adjacent modules partially overlapped in their gene content. In order to allow for unbiased cross-module comparisons, genes belonging to two modules were kept in the one with the highest ISA gene score (see Methods); this concerned 534 genes in total. The seven modules, i.e., cleavage/blastula, gastrula, pharyngula, segmentation, larva, juvenile and adult, contained 444, 820, 487, 414, 415, 290 and 207 genes, respectively (see Table S3 for the lists of the genes). Overall, 3077 different genes were present in these modules, which implies a significant reduction of the number of genes being analyzed in comparison to the original data (14293 genes on the microarray). In particular, the ISA removed the bias related to the genes expressed uniformly across development (like housekeeping genes).
Functional Annotation
We verified the function of genes in modules detected by the ISA by comparing them to relevant known lists of genes. We found that the cleavage/blastula module was significantly enriched in maternal genes identified in [21] (36 genes vs. 19 expected by chance; hypergeometric test, ), and the gastrula module was highly significantly enriched in post-midblastula transition (post-MBT) genes identified in [21] (78 genes vs. 25 expected by chance; hypergeometric test, ). We confirmed the relevance of the segmentation and pharyngula modules by verifying that they were enriched in Hox genes (24 and 7 genes vs. 1 expected by chance, respectively; hypergeometric test, and , respectively), which is consistent with their role in mid-development [22]. We did not have any gold standard for genes expressed at the late stages of development. However, since the early and phylotypic modules were enriched in genes with relevant functions, we are confident that the same is true for the late modules.
Moreover, gene ontology (GO) enrichment analysis confirmed that genes from the modules were enriched in functions relevant to the respective developmental stages. For example, the cleavage/blastula module was enriched in genes involved in protein phosphorylation and dephosphorylation processes, which is consistent with kinase-dependent control of cell cycle and regulation of mid-blastula transition (MBT) in vertebrates [23], [24]. The pharyngula module was enriched in genes associated with cell differentiation, and anatomical structure development. Finally, the adult module was enriched in genes involved in responses to environment, although not significantly (Table S2).
Sequence Conservation
We checked whether the sequences of genes from different modules evolved under different selective pressure. To this end, we calculated the non-synonymous to synonymous substitution ratios () for genes in the modules and asked if the ratio was significantly lower for any of them. With the early conservation model, we would expect the lowest values for genes from early modules. Whereas with the hourglass model, we would expect the lowest values for genes from the phylotypic modules. In the cleavage/blastula module the median was not different from the median for all genes (equal to 0.15). In the other four modules covering embryonic development the median was lower than the median for all genes (Figure 3A), and the difference was significant for all but the segmentation module (randomization test, for the gastrula, pharyngula and larva modules). In the juvenile module, the median was significantly higher than the median for all genes (randomization test, ). In the adult module, the median was also higher than the median for all genes, but the difference was not significant. When analyzing separately sites under purifying selection or evolving neutrally, we also find weaker purifying selection during post-embryonic stages (see Text S1 and Figure S11).
These results were consistent with the study by Roux and Robinson-Rechavi [17], who also reported equally low values during the entire zebrafish embryogenesis, and a small increase in mid-larva, juvenile and adult. In contrast, Hazkani-Covo et al. [10] reported an hourglass pattern for protein distance between mouse and human genes expressed during development. However, the trend was not significant. In [17] some evidence for early conservation was reported in mouse. Projecting the genes from zebrafish modules to mouse-human orthologs, we found equal conservation across development (Figure S12). Overall, data analyses support similar evolutionary constraints on sequences of genes expressed during whole embryogenesis of zebrafish, while for mouse more developmental data is needed to be conclusive.
Gene Age
The differences in age of genes expressed during different stages of the development have been suggested to be a good indicator of evolutionary constraints [11], [25]. Thus, we investigated the age of genes belonging to different modules. We dated each gene by its first appearance in the phylogeny and assigned it to one of the five age groups: 1) Fungi/Metazoa, 2) Bilateria, 3) Coelomata+Chordata, 4) Euteleostomi and 5) Clupeocephala+Danio rerio. Next, for each module we calculated the age distribution of its genes, i.e., the number of genes belonging to each age group, and compared it with the age distribution of all genes.
For all but the cleavage/blastula module we detected significant age variations which differed across modules (Figure 3B; chi-square goodness of fit test, all ). The oldest genes which belong to the Fungi/Metazoa class were overrepresented in the gastrula module (36.7% of genes in the module vs. 25.7% of all genes). The younger Bilateria genes were overrepresented in the phylotypic modules (45.5% and 52.1% of genes in the segmentation and pharyngula modules, respectively, vs. 34.4% of all genes). The youngest genes were overrepresented in the late modules (e.g., for Euteleostomi genes: 25.7%, 35.1% and 35.6% of such genes in larva, juvenile and adult modules, respectively, vs. 18% of all genes). In contrast, Domazet-Lošo and Tautz [11] reported that genes expressed in early and late development tend to be younger than genes expressed in mid-development, supporting the hourglass model. Yet, that result does not hold for log-transformed gene expression levels (Box 1), and is not recovered with measures of gene age other than the transcriptome age index (see Text S1 and Figure S6). With the modular approach we observed that the age of expressed genes decreased throughout ontogeny. This pattern suggests that the oldest evolutionary stages tend to express the oldest genes.
Gene Family Size
Both gene duplication and gene loss can impact phenotypic evolution [26]–[30]. The outcome of these events can be summarized by the resulting gene family size. Consequently, constrained developmental stages should display less changes in gene family size than other stages. To test this hypothesis, for each zebrafish module we calculated the number of its genes that were in 1) one-to-one, 2) one-to-many, 3) many-to-many, and 4) no orthology relation to mouse genes (i.e., no ortholog detectable by the criteria used in Ensembl Compara [31]).
We compared the observed distributions with the distribution of the ortholog relationships for all genes. We detected significant variations of the ortholog relationship for the cleavage/blastula module and for all three late modules (chi-square goodness of fit test, all ). Moreover, the pattern of variation itself differed across different modules. The number of one-to-one orthologs decreased throughout development (Figure 3C). It was significantly higher than expected only in the cleavage/blastula module (54.6% of genes in the module vs. 45.4% of all genes). In contrast, the number of genes with no orthologous relationship increased throughout development (Figure 3C). It was significantly higher than expected only in the juvenile and adult modules (38.2% and 38.4% of genes in the two modules, respectively, vs. 20.4% of all genes), consistent with the excess of “young” genes. A similar pattern was observed for many-to-many orthologs (10.4% and 7.8% of genes in the two modules, respectively, vs. 3.9% of all genes). Finally, the number of one-to-many orthologs was higher than expected only in the larva module (45.6% of genes in the module vs. 30.3% of all genes), and did not differ from expectation in all other modules.
These results were consistent with [17] in which the genes retained in duplicates after the teleost-specific whole genome duplication were reported to have low expression early in the development. Here, we recovered an analogous pattern with the modular approach, showing that the genes expressed early in the development are retained in duplicates less often than genes expressed later. Note that our observation is not limited to whole genome duplication. In addition, we detected the highest number of novel genes amongst genes expressed late in the development.
Expression Conservation
Changes in gene expression are one of the main sources of morphological variation [32]–[34]. The developmental constraints on gene expression might differ from those on the gene sequence [35]–[37]. Thus, for each module, we compared the mean expression profile of its genes with the mean expression profile of their one-to-one orthologs in mouse. We used two different data sets [13], [38] with expression values of mouse genes during the development. The use of two data sets was necessary, because there does not exist a single experiment covering the entire mouse development. The incompatibility of the two microarrays impaired the statistical strength of the analysis. For this reasons the results reported here should be regarded rather as qualitative than quantitative.
Since homology cannot be defined for individual developmental stages between zebrafish and mouse, we first mapped every time point to its broad metastage defined in Bgee database [39] (Figure 4). Next, we calculated the mean expression level in every metastage. This resulted in six expression values for each gene during the development of mouse and zebrafish: zygote, cleavage, blastula, neurula, organogenesis, and post-embryonic stage. Note that the mouse microarrays did not cover the gastrula stage at all. For each module we calculated the Pearson's correlation between the mean expression of its genes and their mouse orthologs across the six metastages. For the cleavage/blastula module no correlation was detected, probably due to the incompatibility of the two mouse microarrays. Nevertheless, there exists a plausible, biological interpretation of the differences in gene expression between the early stages of zebrafish and mouse development. Zebrafish and mouse form two different embryological structures during blastulation, a blastula and a blastocyst, respectively. The blastocyst is a mammalian innovation that consists of an embryoblast (that develop into structures of the fetus) and a trophoblast (that form the extraembryonic tissue). In contrast, there is no extraembryonic tissue in zebrafish. Overall, the lack of correlation between gene expression for the early stages of mouse and zebrafish development could be explained by these structural differences. For other modules the correlation was positive (Figure 3D), however due to the low number of data points in the analysis, no correlation values were significant (all ).
These results stood in contrast with the report by Irie and Kuratani [13] who showed the highest conservation of gene expression in mid-development. However, a re-analysis of their data suggested that this observation was not significant (see Text S1 and Figure S9). Also, both their and our studies shared problems related to the use of two data sets from different sources to cover mouse development. This and the lack of a straightforward homology between ontogenies of different species made it difficult to conclude on the conservation of gene expression during vertebrate development.
Regulatory Regions
The cis-regulatory hypothesis asserts that most morphological evolution is due to changes in cis-regulatory sequences [40]–[42]. A reasonable prediction of this hypothesis is slower cis-element turnover in morphologically conserved developmental periods. We examined the presence of highly conserved non-coding elements (HCNEs) [43] and of transposon-free regions (TFRs) [44] in the proximity of genes from each module. In the analysis of HCNEs, we counted their number between zebrafish and mouse (detected with 70% identity) in regions of 500 base pairs upstream from the transcription start site. We found that only genes from the phylotypic modules were significantly enriched in HCNEs (hypergeometric test, , and for segmentation and pharyngula modules, respectively). We tested the sensitivity of the results by changing the analyzed regions' length to 200 and 1000 base pairs upstream from the transcription start site, by looking for HCNEs in introns, and using HCNEs detected with identity of 90%. In all cases, we obtained similar results (see Table S1). In the analysis of TFRs, we counted the number of genes from each module that have been associated with TFRs in zebrafish. Importantly, these TFRs were reported to be conserved between vertebrates as distant as zebrafish and human. We found that only genes from the pharyngula module were significantly enriched in TFRs (hypergeometric test, ).
The highly conserved non-coding elements and transposon-free regions are often associated with developmental regulatory genes, and with transcription factors (TFs) in particular [43]–[47]. In order to confirm this association, we calculated the fractions of genes with HCNEs or with TFRs in their proximity. We observed that for both features this fraction was higher for TFs than for all genes. Importantly, we observed that only the phylotypic modules were enriched in TFs (Figure 3E). This partially explained the enrichment in HCNEs and TFRs for genes expressed in mid-development. In addition, HCNEs were more often present in the proximity of TFs from the pharyngula module than in the proximity of TFs in general (Figure 3E; 8.8% of TFs from the pharyngula module had at least one HCNE in their proximity, and only 3.7% of all TFs had at least one HCNEs in their proximity). Also TFRs were more often present in the proximity of TFs from the phylotypic modules than in the proximity of TFs in general (Figure 3E; 31% and 45% of TFs from the segmentation and pharyngula modules, respectively, had TFRs in their proximity, and only 26% of all TFs had TFRs in their proximity). Consequently, the enrichment in HCNEs and TFRs for genes expressed in the phylotypic stage seems to be related to the regulation of developmental processes. Interestingly, only few Hox genes from phylotypic modules were associated with HCNEs (four Hox genes from segmentation module), and with TFRs (six Hox genes from segmentation module, and one Hox gene from pharyngula module).
In addition, we checked for genes that preserved their specific ancestral order in the genome across metazoans (so called conserved ancestral microsyntenic pairs, [48]) and are known to be involved in the regulation of development. We found that they were slightly overrepresented in the segmentation module, but only at the limit of statistical significance (see Text S1).
Finally, we checked for core developmental genes in each module (see [47] for the list of genes). These genes are known to be involved in the regulation of development, and to have highly conserved regulatory regions within different taxa, including, nematodes, insects and vertebrates [47]. We detected a significant enrichment in these genes only in the pharyngula module (20 core genes; hypergeometric test, ), supporting the hourglass model.
Discussion
Our goal was to study developmental constraints acting on various gene properties in vertebrates. Overall, we analyzed and compared five gene characteristics, namely the conservation of gene sequence, gene expression, and regulatory elements, as well as age and orthology relationships. To this end we identified distinct sets of genes with time-specific expression in zebrafish development, i.e., genes over-expressed in one of the seven consecutive stages: cleavage/blastula, gastrula, segmentation, pharyngula, larva, juvenile and adult. We believe that the change in expression level is a reliable indicator of gene involvement in different stages, although genes might also play a role outside the stages of their highest expression. Moreover, the modules contained genes overexpressed in relation to other stages, regardless of the absolute values of their expression. Thus, lowly expressed genes were also considered by the modularization algorithm, as long as they displayed some variance in expression levels over developmental time.
Several features do not show any significant pattern over embryonic development, often in contradiction to previous reports. There is notably no evidence for change in selective pressure acting on sequences of protein-coding genes (i.e., ) over development (in contrast to [10]). Unfortunately, the available data does not allow a strong conclusion concerning the conservation of expression (in contrast to [13]), despite the probable importance of this feature in the evolution of development. In this respect, the situation in vertebrates stands in contrast to the relatively clear results in flies [12], where the evolution of expression has been shown to be most constrained in mid-development.
Gene orthology relations support the early conservation model. We show that early stages are less prone to tolerate both gene duplication (consistent with [17]) and gene introduction. The deficit in duplication in early development could also be due to a lack of opportunities for neo- or sub-functionalization in the anatomically simpler stages, which is not exclusive with strong purifying selection. The interpretation of transcriptome age is less straightforward. Our observations suggest that the oldest evolutionary stages tend to express of the oldest genes. It is possible that early stages are evolutionarily oldest, and that this is why they are enriched in oldest genes. Consequently, it is the presence of young genes in a module that would mark relaxed developmental constraints during the corresponding stage. However, neither early nor phylotypic modules are enriched in young genes (Euteleostomi and Clupeocephala+Danio rerio), which suggests similar developmental constraints in early and mid-ontogeny. In any case, we do not find any support for the hypothesis that the phylotypic stage would be characterized by the oldest transcriptome (in contrast to [11]).
While the modularization approach does not support several previous hypotheses of genomic traces of the phylotypic period, it allows us to distinguish a strong signal of conservation of gene regulation in mid-development. While this had not yet been reported in genomic studies, it is consistent with early descriptions of the phylotypic stage as characterized by Hox genes body patterning activity [6]. Of note, the patterns that we observe are robust to the removal of Hox genes (data not shown), so they are more general than this original observation. We observed an excess of HCNEs only for genes expressed in the pharyngula module, and an excess of TFRs only for genes expressed in the phylotypic modules. The enrichment in HCNEs and TFRs has been related to developmental regulatory genes, and to transcription factors in particular [43], [45]–[47]. Indeed, we observed that more TFs were expressed in mid-development than in other stages. Also, we showed that a significant proportion of TFs expressed in mid-development had conserved regulatory regions (i.e., HCNEs and TFRs), in contrast to TFs expressed early or late. Consequently, the enrichment in HCNEs and TFRs for genes expressed in mid-development can be explained by both a higher number of TFs and a higher number of HCNEs and TFRs for these TFs, than for genes expressed earlier or later. Moreover, the pharyngula module was associated with core developmental genes. Overall, these results suggest that mid-developmental processes have extremely high conservation of regulation. This conservation could translate into observed common traits of the phylum expressed at the phenotypic level during mid-development. In addition, core developmental genes are known to be present in different taxa (e.g., nematodes, insects and vertebrates), in each of which they have a conserved regulation that evolved in parallel [47]. This could explain why the phylotypic stage is observed not only in vertebrates [49], but also in other phyla, e.g., in arthropods [4], [12].
Finally, for all of the features which we have considered there is at least some trend towards weaker evolutionary constraints in the latest stages: is higher in post-embryonic stages and there are less sites under purifying selection (Figure S11); correlation of expression is lowest for maternal, larval and adult genes; young genes and genes with duplications in fishes or other vertebrates are overrepresented in late modules; and genes expressed in juveniles and adults have the less HCNEs and TFRs. Although not all of these trends are significant, no feature shows stronger conservation in late development or adult. Thus, while different aspects of gene evolution show constraints at different times of development, there appears to be a generally faster evolution of all aspects of larval, juvenile and adult genes. Whether this is due to lower constraints (i.e., less purifying selection) or to stronger involvement in adaptation (i.e., more diversifying selection), remains an open question.
In summary, we studied evidence for, or against, any particular pattern of developmental constraints by considering sets of genes with time-specific expression patterns. Comparing such independent sets of genes with a clear function during embryogenesis resulted in cleaner and more fine-grained characterization of evolutionary patterns than previously reported. Notably, we showed that different levels of molecular evolution follow different patterns of developmental constraints. The sequence of regulatory regions is most conserved for genes expressed in mid-development, consistent with the hourglass model. Gene duplication and new gene introduction is most constrained during early development, supporting the early conservation model. Whereas, all gene properties coherently show the least conservation for the latest stages, consistent with both the early conservation and the hourglass models.
Methods
Gene Expression Data
Microarray data of zebrafish development were downloaded from NCBI's Gene Expression Omnibus [50] (GSE24616). This study was performed on the Agilent Zebrafish (V2) Gene Expression Microarray. In total, expression profiles for 60 developmental stages (from unfertilized egg to adults stages) were measured. The last ten stages (55 days–1 year 6 months) were measured separately for male and female. Two replicates were made per time point, resulting in microarrays in total. For each microarray, values of gProccessedSignal were log10 transformed and normalized as follows. Separately for each replicate, we equalized the expression signals between microarrays using the spike-ins reference, to account for different amounts of RNA present throughout development. To this aim, we first quantile normalized the expression signal of all spike-ins from all microarrays. Then, for each spike-in level we took the median value of expression signal before and after quantile normalization. This resulted in 10 pairs of expression signals (original signal vs. normalized signal). With linear interpolation between these points, we obtained a piecewise linear curve that defined a mapping from original to normalized expression signals, which we used to equalize the expression signals from all microarrays. This was done by projecting each expression signal onto the piecewise linear curve and calculating the corresponding normalized value. Finally, we quantile normalized the data within replicates and computed the mean value for each gene within replicates. Expression values measured separately for males and females were averaged for each time point.
Microarray data of mouse development were downloaded from Array Express (E-MEXP-51 and E-MTAB-368). The E-MEXP-51 study was performed on () F1 mice using Affymetrix GeneChip Murine Genome U74Av2. In total, expression profiles for 10 early developmental stages (zygote, early 2-cell, mid 2-cell, late 2-cell, 4 cell, 8 cell, 16 cell, early blastocyst, mid-blastocyst, late blastocyst) were measured. 2–4 replicates were made per time point. The data were normalized using gcRMA package.
The E-MTAB-368 study was performed on C57BL/6 mice using Affymetrix GeneChip Mouse Genome 430 2.0. In total, expression profiles for 8 mid and late developmental stages (E7.5, E8.5, E9.5, E10.5, E12.5, E14.5, E16.5, E18.5) were measured. 2–3 replicates were made per time point. The data were normalized using gcRMA package.
Mapping Probe Sets to Ensembl Genes
Agilent probe sets were mapped to their corresponding zebrafish genes (Ensembl release 63 [51]) using BioMart [52]. Probe sets which did not map unambiguously to an Ensembl gene were excluded from the analysis. A total of 19049 probe sets corresponding to 14293 zebrafish genes were taken into account in our analysis.
Affymetrix probe sets were mapped to their corresponding mouse genes (Ensembl release 63 [51]) using BioMart [52]. Probe sets which did not map unambiguously to an Ensembl gene were excluded from the analysis. For genes that were mapped by several probe sets we used the signal averaged across the probe sets. A total of 2883 mouse genes mapped by probe sets present on both mouse microarrays were taken into account in the gene expression analysis.
Iterative Signature Algorithm (ISA)
The ISA identifies modules by an iterative procedure. A detailed description of the algorithm in the general case is given in [19] (see also http://www2.unil.ch/cbg/homepage/downloads/ISA_tutorial.pdf). In this specific study, the algorithm was initialized with seven candidate seeds, each consisting of one artificial expression profile corresponding to one of the zebrafish developmental stages (see Text S1 for details). Next, these seeds were refined through iterations by adding or removing genes and developmental time points until the processes converge to stable sets, which are referred to as (transcription) modules. Each developmental time point and gene received a score indicating their membership (if non-zero) and contribution to a given module. The closest the score for a gene or developmental time point was to one, the stronger the association between the gene/developmental time point and the rest of the module.
The ISA was run twice with the following sets of thresholds: 1) and , and 2) and , for genes and developmental time points, respectively. We obtained the pharyngula module only in the case of , and all other modules with both and . All the modules contained their corresponding idealized profile. For further analysis, we kept a single module per developmental stage. From the pair of modules, we chose the one in which the idealized profile had a higher gene score. Overall, segmentation, pharyngula and juvenile modules were obtained with , and cleavage/blastula, gastrula, larva, and adult modules were obtained with .
GO Enrichment Analysis
Gene ontology (GO) association for all genes mapped by zebrafish probe sets were downloaded from Ensembl release 63 [51], using BioMart [52]. GO enrichment was tested by Fisher's exact test, using the Bioconductor package topGO [53] version 2.2.0. The reference set consisted of all Ensembl genes mapped by probe sets of the microarray used. The “elim” algorithm of topGO was used to eliminate the (tree-like) hierarchical dependency of the GO terms. To correct for multiple testing the Bonferroni correction was applied. For every module GO categories with corrected P-value lower than 0.01 were reported, if less then ten GO categories were significant we reported the top ten (see Table S2).
Gene Sequence Analysis
Ensembl Perl API release 70 [54] was used to extract all Ensembl Compara gene trees (and alignments) with a Clupeocephala (bony fishes) root. Sequences with too many gaps, or undefined nucleotides, were removed from the tree and alignment by MaxAlign (version 1.1) [55]. Only trees without duplication (one-to-one orthologs) and with at least six leaves were kept. This resulted in 6769 trees.
The site model from codeml [56] (PAML package release 4.6; models M1a and M2a in codeml) was used to predict sites-specific selection in these trees. Finally, 916 trees were removed due to the lack of zebrafish genes, and 81 were removed due to lack of expression data on the zebrafish microarray. This resulted in 5772 trees. For every gene tree we calculated its mean value ().
For every module we calculated the median ratio of its genes, where was the number of genes belonging to one of the 5772 trees. Next, we generated 10000 sets of randomly chosen genes. For each set we calculated the median ratio. Thus, we constructed a sampling distribution of the median values for a set of genes. Then we calculated the probability that the median of the original module was sampled from the constructed distribution. This allowed us to assess if the observed median ratio was significantly different from the expected median value. To correct for multiple testing we applied the Bonferroni correction. We used 0.01 as a significance level.
Gene Age Analysis
To study the age of genes belonging to different modules we dated the genes by their first appearance in the phylogeny. This consisted of retrieving the age of the oldest node of their Gene tree in Ensembl release 63 [51]. Genes' age was described with one of the following categories: Fungi/Metazoa, Bilateria, Coelomata, Chordata, Euteleostomi, Clupeocephala, and Danio rerio. To fit the chi-square test requirements (more than 5 elements in a group) we merged the genes into five age categories: Fungi/Metazoa, Bilateria, Coelomata+Chordata, Euteleostomi, Clupeocephala+Danio rerio. Next, for every module we calculated the age distribution of its genes. We performed chi-square goodness of fit test to compare the observed and expected distributions of age classes in the modules. The expected distribution was estimated by classifying all zebrafish genes into one of the five age categories. To correct for multiple testing we applied the Bonferroni correction. We used 0.01 as a significance level.
Zebrafish–Mouse Orthologous Genes
Homology information of zebrafish and mouse genes was retrieved from Ensembl release 63 [51], using BioMart [52]. A total of 17482 pairs of zebrafish-mouse orthologous genes had expression information in the zebrafish microarray data (14293 zebrafish genes and 11322 mouse genes). Among them there were 6441 one-to-one orthologous pairs, 5048 one-to-many orthologous pairs, and 2993 many-to-many orthologous pairs. 2901 zebrafish genes showed no orthology relationship with mouse genome. From further analysis we excluded 99 “apparent-one-to-one” gene pairs. For every module we calculated the number of genes that were in one-to-one, one-to-many, many-to-many and no orthology relation to mouse genes. Next, we performed chi-square goodness of fit test to compare the observed and expected distributions of orthology classes in the modules. The expected distribution was estimated by classifying all zebrafish genes into one of the four orthology categories. To correct for multiple testing we applied the Bonferroni correction. We used 0.01 as a significance level.
Gene Expression Conservation
To study expression conservation between zebrafish genes assigned to the modules and their mouse one-to-one orthologs, we used gene expression data for 2883 orthologous gene pairs (the limiting factor being the mapping to both mouse microarrays). For genes that were mapped by several probe sets we averaged their signal across the probe sets for both species. In order to compare gene expression between two species, we first calculated the mean expression for zebrafish genes present in the modules and their one-to-one mouse orthologs. Due to the incompatibility of two mouse microarray data used it was difficult to provide a meaningful comparison of expression for the two species. To calculate the correlation between expression profiles between zebrafish and mouse we reduced their expression profiles to six metastages: zygote, cleavage, blastula, neurula, organogenesis, and post-embryonic stage (see [39] for detailed definition of metastage). For every module and every metastage we calculated the mean expression level for zebrafish genes and their mouse one-to-one orthologs, and next we calculated the Pearson's correlation coefficient between them.
Highly Conserved Non-Coding Elements
Location data for highly conserved non-coding elements (HCNE) between zebrafish and mouse (70% of identity) was retrieved from Ancora [43] (http://ancora.genereg.net/downloads/danRer7/vs_mouse). The file HCNE_danRer7_mm9_70pc_50col.bed.gz was downloaded and used in the analysis. For each of the 14293 Ensembl genes considered in our analysis, we calculated the number of HCNE in regions of 500 base pairs upstream from the transcription start site. Next, for every module we performed a hypergeometric test to assess if they were significantly enriched in genes with HCNE. To correct for multiple testing we applied the Bonferroni correction. We used 0.01 as a significance level. In additional analyses, we calculated the number of HCNE in regions of 200 and 1000 base pairs upstream from the transcription start site, as well as in introns. Also, we repeated the analysis with HCNEs of 90% identity (see Text S1).
Transposon-Free Regions
Location data for transposon-free regions (TFRs) in zebrafish was retrieved from [44] (http://www.biomedcentral.com/content/supplementary/1471-2164-8-470-S1.txt). First, each TFR was associated with Ensembl ID [51] of its closest transcript from genome assembly Zv6. Then for each Ensembl transcript ID we retrieved an Ensembl gene ID from genome assembly Zv7. For every module we performed a hypergeometric test to assess if they were significantly enriched in genes with TFRs in their proximity. To correct for multiple testing we applied the Bonferroni correction. We used 0.01 as a significance level.
Transcription Factors
The set of transcription factors (TFs) was defined based on GO category annotation: GO: 0006355, regulation of transcription, DNA-dependent. Among 14293 Ensembl genes, 957 were annotated as transcription factors. For every module we performed a hypergeometric test to assess if they were significantly enriched in TFs. Next, we performed a hypergeometric test to assess if the TFs present in the modules were enriched in HCNEs and TFRs. To correct for multiple testing we applied the Bonferroni correction. We used 0.01 as a significance level.
Supporting Information
Zdroje
1. PoeS, WakeMH (2004) Quantitative tests of general models for the evolution of development. Am Nat 164: 415–22.
2. von Baer KE (1828) Ueber Entwicklungsgeschichte der Thiere: Beobachtung und Reexion. Königsberg: Bornträger.
3. SeidelF (1960) Körpergrundgestalt und keimstruktur. eine erörterung über die grundlagen der vergleichenden und experimentellen embryologie und deren gültigkeit bei phylogenetischen überlegungen. Zool Anz 164: 245–305.
4. Sander K (1983) The evolution of patterning mechanisms: gleanings from insect embryogenesis and spermatogenesis. In: Goodwin BC WC Holder N, editor, Development and evolution. Cambridge University Press, pp. 137–159.
5. Elinson R (1987) Change in developmental patterns: Embryos of amphibians with large eggs. In: Raff RA RE, editor, Development as an Evolutionary Process. New York: Alan R. Liss., pp. 1–21.
6. DubouleD (1994) Temporal colinearity and the phylotypic progression: a basis for the stability of a vertebrate bauplan and the evolution of morphologies through heterochrony. Dev Suppl 135–42.
7. Raff RA (1996) The shape of life: genes, development, and the evolution of animal form. Chicago; London: University of Chicago Press.
8. Prud'hommeB, GompelN (2010) Evolutionary biology: Genomic hourglass. Nature 468: 768–9.
9. KalinkaA, TomancakP (2012) The evolution of early animal embryos: conservation or divergence? Trends in Ecology & Evolution 27: 385–393.
10. Hazkani-CovoE, WoolD, GraurD (2005) In search of the vertebrate phylotypic stage: a molecular examination of the developmental hourglass model and von Baer's third law. J Exp Zool B Mol Dev Evol 304: 150–8.
11. Domazet-LošoT, TautzD (2010) A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature 468: 815–8.
12. KalinkaAT, VargaKM, GerrardDT, PreibischS, CorcoranDL, et al. (2010) Gene expression divergence recapitulates the developmental hourglass model. Nature 468: 811–4.
13. IrieN, KurataniS (2011) Comparative transcriptome analysis reveals vertebrate phylotypic period during organogenesis. Nat Commun 2: 248.
14. QuintM, DrostHG, GabelA, UllrichKK, BönnM, et al. (2012) A transcriptomic hourglass in plant embryogenesis. Nature
15. McDonald JH (2009) Handbook of Biological Statistics (2nd ed.). Baltimore, Maryland: Sparky House Publishing.
16. Speed T (2000) Always log spot intensities and ratios. Available: http://www.stat.berkeley.edu/users/terry/zarray/Html/log.html. Accessed 20 March 2013.
17. RouxJ, Robinson-RechaviM (2008) Developmental constraints on vertebrate genome evolution. PLoS Genet 4: e1000311 doi:10.1371/journal.pgen.1000311.
18. ComteA, RouxJ, Robinson-RechaviM (2010) Molecular signaling in zebrafish development and the vertebrate phylotypic period. Evolution & development 12: 144–156.
19. BergmannS, IhmelsJ, BarkaiN (2003) Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67: 031902.
20. IhmelsJ, BergmannS, BarkaiN (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20: 1993–2003.
21. AanesH, WinataCL, LinCH, ChenJP, SrinivasanKG, et al. (2011) Zebrafish mRNA sequencing deciphers novelties in transcriptome dynamics during maternal to zygotic transition. Genome Res 21: 1328–38.
22. KrumlaufR (1994) Hox genes in vertebrate development. Cell 78: 191–201.
23. HartleyRS, RempelRE, MallerJL (1996) In vivo regulation of the early embryonic cell cycle in Xenopus. Dev Biol 173: 408–19.
24. YardenA, GeigerB (1996) Zebrafish cyclin E regulation during early embryogenesis. Dev Dyn 206: 1–11.
25. IrieN, Sehara-FujisawaA (2007) The vertebrate phylotypic stage and an early bilaterian-related stage in mouse embryogenesis defined by genomic information. BMC Biol 5: 1.
26. Ohno S, et al.. (1970) Evolution by gene duplication. Berlin, Heidelberg and New York: Springer- Verlag.
27. ZhangJ (2003) Evolution by gene duplication - an update. Trends Ecol Evol 18: 292–298.
28. NeiM (2007) The new mutation theory of phenotypic evolution. Proc Natl Acad Sci U S A 104: 12235–42.
29. WangX, GrusWE, ZhangJ (2006) Gene losses during human origins. PLoS Biol 4: e52 doi:10.1371/journal.pbio.0040052.
30. DemuthJP, HahnMW (2009) The life and death of gene families. Bioessays 31: 29–39.
31. VilellaAJ, SeverinJ, Ureta-VidalA, HengL, DurbinR, et al. (2009) EnsemblCompara genetrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19: 327–35.
32. KingMC, WilsonAC (1975) Evolution at two levels in humans and chimpanzees. Science 188: 107–16.
33. PreussTM, CáceresM, OldhamMC, GeschwindDH (2004) Human brain evolution: insights from microarrays. Nat Rev Genet 5: 850–60.
34. CarrollSB (2005) Evolution at two levels: on genes and form. PLoS Biol 3: e245 doi:10.1371/journal.pbio.0030245.
35. JordanIK, Mariño-RamírezL, WolfYI, KooninEV (2004) Conservation and coevolution in the scale-free human gene coexpression network. Mol Biol Evol 21: 2058–70.
36. YanaiI, GraurD, OphirR (2004) Incongruent expression profiles between human and mouse orthologous genes suggest widespread neutral evolution of transcription control. OMICS 8: 15–24.
37. JordanIK, Marino-RamirezL, KooninEV (2005) Evolutionary significance of gene expression divergence. Gene 345: 119–126.
38. WangQT, PiotrowskaK, CiemerychMA, MilenkovicL, ScottMP, et al. (2004) A genome-wide study of gene activity reveals developmental signaling pathways in the preimplantation mouse embryo. Dev Cell 6: 133–44.
39. Bastian F, Parmentier G, Roux J, Moretti S, Laudet V, et al.. (2008) Bgee: Integrating and comparing heterogeneous transcriptome data among species. In: Bairoch A, Cohen-Boulakia S, Froidevaux C, editors, Data Integration in the Life Sciences, Springer Berlin/Heidelberg, volume 5109 of Lecture Notes in Computer Science. pp. 124–131.
40. SternDL (2000) Evolutionary developmental biology and the problem of variation. Evolution 54: 1079–91.
41. WrayGA (2007) The evolutionary signi_cance of cis-regulatory mutations. Nat Rev Genet 8: 206–16.
42. CarrollSB (2008) Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134: 25–36.
43. EngströmPG, FredmanD, LenhardB (2008) Ancora: a web resource for exploring highly conserved noncoding elements and their association with developmental regulatory genes. Genome Biol 9: R34.
44. SimonsC, MakuninIV, PheasantM, MattickJS (2007) Maintenance of transposon-free regions throughout vertebrate evolution. BMC Genomics 8: 470.
45. SandelinA, BaileyP, BruceS, EngströmPG, KlosJM, et al. (2004) Arrays of ultraconserved noncoding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics 5: 99.
46. WoolfeA, GoodsonM, GoodeDK, SnellP, McEwenGK, et al. (2005) Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol 3: e7 doi:10.1371/journal.pbio.0030007.
47. VavouriT, WalterK, GilksWR, LehnerB, ElgarG (2007) Parallel evolution of conserved noncoding elements that target a common set of developmental regulatory genes from worms to humans. Genome Biol 8: R15.
48. IrimiaM, TenaJJ, AlexisM, Fernandez-MiñanA, MaesoI, et al. (2012) Extensive conservation of ancient microsynteny across metazoans due to cis-regulatory constraints. Genome Res
49. KimmelCB, BallardWW, KimmelSR, UllmannB, SchillingTF (1995) Stages of embryonic development of the zebrafish. Dev Dyn 203: 253–310.
50. EdgarR, DomrachevM, LashAE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30: 207–10.
51. HubbardTJP, AkenBL, AylingS, BallesterB, BealK, et al. (2009) Ensembl 2009. Nucleic Acids Res 37: D690–7.
52. SmedleyD, HaiderS, BallesterB, HollandR, LondonD, et al. (2009) BioMart – biological queries made easy. BMC Genomics 10: 22.
53. AlexaA, RahnenfuhrerJ, LengauerT (2006) Improved scoring of functional groups from gene expression data by decorrelating go graph structure. Bioinformatics 22: 1600–1607.
54. FlicekP, AhmedI, Amode MRS, BarrellD, BealK, et al. (2013) Ensembl 2013. Nucleic Acids Res 41: D48–55.
55. Gouveia-OliveiraR, SackettPW, PedersenAG (2007) MaxAlign: maximizing usable data in alignment. BMC Bioinformatics 8: 312.
56. YangZ (2007) PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol Biol Evol 24: 1586–91.
57. WolfYI, NovichkovPS, KarevGP, KooninEV, LipmanDJ (2009) The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci U S A 106: 7273–80.
Štítky
Genetika Reprodukční medicínaČlánek vyšel v časopise
PLOS Genetics
2013 Číslo 4
- Management pacientů s MPN a neobvyklou kombinací genových přestaveb – systematický přehled a kazuistiky
- Management péče o pacientku s karcinomem ovaria a neočekávanou mutací CDH1 – kazuistika
- Primární hyperoxalurie – aktuální možnosti diagnostiky a léčby
- Vliv kvality morfologie spermií na úspěšnost intrauterinní inseminace
- Akutní intermitentní porfyrie
Nejčtenější v tomto čísle
- The G4 Genome
- Neutral Genomic Microevolution of a Recently Emerged Pathogen, Serovar Agona
- The Histone Demethylase Jarid1b Ensures Faithful Mouse Development by Protecting Developmental Genes from Aberrant H3K4me3
- The Tissue-Specific RNA Binding Protein T-STAR Controls Regional Splicing Patterns of Pre-mRNAs in the Brain