Evidence that Adaptation in Is Not Limited by Mutation at Single Sites

Download PDF České info

Adaptation in eukaryotes is generally assumed to be mutation-limited because of small effective population sizes. This view is difficult to reconcile, however, with the observation that adaptation to anthropogenic changes, such as the introduction of pesticides, can occur very rapidly. Here we investigate adaptation at a key insecticide resistance locus (Ace) in Drosophila melanogaster and show that multiple simple and complex resistance alleles evolved quickly and repeatedly within individual populations. Our results imply that the current effective population size of modern D. melanogaster populations is likely to be substantially larger (≥100-fold) than commonly believed. This discrepancy arises because estimates of the effective population size are generally derived from levels of standing variation and thus reveal long-term population dynamics dominated by sharp—even if infrequent—bottlenecks. The short-term effective population sizes relevant for strong adaptation, on the other hand, might be much closer to census population sizes. Adaptation in Drosophila may therefore not be limited by waiting for mutations at single sites, and complex adaptive alleles can be generated quickly without fixation of intermediate states. Adaptive events should also commonly involve the simultaneous rise in frequency of independently generated adaptive mutations. These so-called soft sweeps have very distinct effects on the linked neutral polymorphisms compared to the standard hard sweeps in mutation-limited scenarios. Methods for the mapping of adaptive mutations or association mapping of evolutionarily relevant mutations may thus need to be reconsidered.

Published in the journal: . PLoS Genet 6(6): e32767. doi:10.1371/journal.pgen.1000924
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1000924

Summary

Introduction

The speed of adaptation in eukaryotes is commonly assumed to be limited by the waiting-time for an appropriate adaptive mutation. This notion is based on estimates of the population parameter Θ = 4N_eμ (the product of effective population size N_e and per-site mutation rate μ) derived from levels of standing neutral variation. Θ can be interpreted as the rate at which new mutations arise in the population [1]. In contrast to many prokaryotes or viruses, where Θ can easily be on the order of one or larger - and consequently most single nucleotide mutations exist in the population at every given time – estimated values of Θ in eukaryotes are typically much smaller than one [1]. Adaptation should thus be substantially retarded, especially when adaptive alleles need to carry several independent mutations.

However, adaptation to anthropogenic changes such as the evolution of insecticide resistance has been observed to occur very rapidly and often involves complex alleles [2]–[7]. One possible explanation for such cases of rapid adaptation is that complex resistant alleles predate environmental changes [8], [9]. The other possibility is that adaptive mutations emerge more quickly in eukaryotic populations than commonly believed. The latter would imply that estimates of Θ have to be reconsidered in the context of rapid adaptation.

In order to understand the population parameters that allow for rapid adaptation in eukaryotes, we study here a well-documented example: the evolution of pesticide resistance in D. melanogaster.

Acetylcholinesterase (AChE), a key neuronal signalling enzyme, is the major target of the most commonly used insecticides, organophosphates (OPs) and carbamates (CMs) [10]. Introduced in the 1950–1960's, these insecticides have been used pervasively around the world since then. Within a few years of their introduction cases of insecticide-resistant AChE alleles emerged [11] and today insecticide-resistant AChE has been observed and characterized in numerous arthropod species [2]–[7].

In D. melanogaster, four particular point mutations at highly conserved sites (I161V, G265A, F330Y, G368A) of Ace (the gene coding for AChE) lead to resistance to OPs and CMs [5], [12] (Figure S1). Alleles carrying these mutations singly and in combination have been found in natural populations worldwide [12]. In the presence of OPs, these mutations confer semi-additive resistance: single mutations provide moderate levels of resistance to ∼75% of OPs, any two mutations in combination provide higher levels of resistance to ∼80% of OPs, while alleles with three or four mutations lead to strong resistance to practically all OPs [12]. One 3-mutation allele (I161V, G265A, F330Y) was found worldwide at particularly high frequencies and is a key determinant of resistance to OPs [12]. In the absence of pesticides all resistant alleles are strongly deleterious with the selective coefficient on the order of negative 5–20% [13], [14].

Here we collect data and provide quantitative arguments (both analytical and simulation-based) that the observed signatures of adaptation at Ace imply a much larger (∼100-fold or more) effective population size than is commonly assumed for D. melanogaster. We discuss the implications of our results for the study of adaptation in Drosophila and other species with large census sizes.

Results

Fast and repeated evolution of simple and complex resistance alleles within individual subpopulations of D. melanogaster

D. melanogaster evolved in sub-Saharan Africa (AF) and spread worldwide over the past 10–16 thousand years [15]. The worldwide spread was associated with a severe bottleneck that resulted in sub-sampling of AF diversity by the out-of-Africa strains [15]. Resistant alleles found outside of AF may either have arisen in situ in the derived out-of-Africa populations or were present in the AF population prior to the bottleneck (similar to [8], [9]). These two hypotheses can be distinguished by studying haplotype backgrounds of the resistant alleles. Resistant mutations that evolved in derived populations in situ, unlike ancient AF resistant alleles, should reside on the background of sensitive haplotypes common in the exposed out-of-Africa populations that passed through the bottleneck.

We collected D. melanogaster sequence data (∼1.5 kb covering the known four sites of resistant mutations in Ace) from 93 resistant and sensitive strains. We sequenced 9 alleles from the ancestral AF populations, 10 alleles from the derived Eurasian and American populations collected prior to the 1950s (M strains) [16], and 74 alleles from the recently collected (1990–2009) derived populations in North America (NA) and Australia (AUS) (Table S1 and Table S2).

We detected resistant mutations at the first three sites (I161V, G265A, F330Y) but did not find the resistant mutation at the fourth site (G368A). We estimated that ∼40% of the strains contain resistant mutations in the modern NA and AUS populations of D. melanogaster. Figure 1 shows the most parsimonious haplotype network of the sequenced alleles. Figure 2 shows the segregating sites for sensitive haplotypes, as well as the I161V 1-mutation and the 3-mutation haplotypes (Table S2 shows segregating sites for all sequenced alleles).

Haplotype network at <i>Ace</i>. — **Fig. 1. Haplotype network at *Ace*.**

Soft sweeps at <i>Ace</i>. — **Fig. 2. Soft sweeps at *Ace*.**

In all cases the NA and AUS resistant alleles show no signs of having predated the spread of D. melanogaster out-of-Africa. Instead, the resistant alleles appear to have arisen in situ in different populations, as indicated by the observation that locally common resistant alleles are present on the locally common sensitive haplotypes. For instance, AUS alleles with the resistant mutation in the first site (marked t) have the haplotype background that is identical to the sensitive haplotype N that is common in AUS but has not been detected by us in NA. In contrast, the NA first site mutation alleles (marked p through s) have the haplotype backgrounds that are nearly identical to the sensitive haplotype L that is common in NA. Additionally, the haplotype background of one of the AF alleles (marked u) with the resistant mutation I161V is substantially diverged from the NA and AUS resistant strains and is more similar to the sensitive alleles common in AF. This suggests a third independent origin of the mutation I161V in AF. Note that the complex 2- and 3-mutation haplotypes also appear to have arisen in situ in the derived populations as their haplotype backgrounds are most closely related to the common out-of-Africa sensitive haplotype L.

In summary, the sequence analysis of the resistant and sensitive alleles reveals two signatures of the adaptive evolution of pesticide resistance at the Ace gene. First, adaptation has been rapid enough such that in the past 50 years (1000 to at most 1500 generations [17]) multiple resistant alleles including a complex allele containing three independent mutations at three different sites evolved and spread to high frequencies worldwide. Second, many resulting resistant alleles are present on distinct haplotypes that differ in the immediate vicinity of the adaptive sites, such as the adaptive change from A to G at the first site (I161V) in NA and AUS that is located on the haplotypes p, q, r, s, and t (Figure 2).

Patterns of evolution at Ace are inconsistent with small values of Θ: analytical considerations under simple scenarios

Below we consider a simple scenario of a single locus in a panmictic population of effective size N_e. We assume that the resistant alleles were in mutation-selection balance prior to pesticide application with a strongly deleterious selection coefficient of −5% [13], [14] and that they became advantageous after the application of pesticides.

In Box 1 we show that if Θ∼0.01 the probability of successful adaptation from standing genetic variation is less than 1% even if positive selection is extremely strong (s∼100%). Thus, if Θ∼0.01, as previously estimated based on analyses of neutral loci in Drosophila [18], [19], we only need to consider the case of adaptation from de novo mutations.

Box 1. Probability of adaptation from standing genetic variation and waiting time for de novo mutation. Consider a single locus in a panmictic diploid population of constant effective size N_e. New resistant alleles arise at rate Θ_u = Θ/3 (only one out of three mutations give rise to an adaptive allele). Evolution is modelled in a Wright-Fisher infinite alleles framework with selection. Heterozygotes have fitness 1+s, fitness is multiplicative, and the locus evolves independently of other loci. Prior to pesticide application resistant alleles are deleterious with selection coefficient s_d<0. The density function g(x) for the frequency distribution of resistant alleles in mutation-selection balance is then given by [47](B1)We thus do not expect resistant alleles to be present in the population most of the time for Θ = 0.01 (N_e∼10⁶) and s_d = −5% because .

After the onset of pesticide application, resistant alleles become advantageous (s>0). The probability of successful adaptation from standing genetic variation is approximately [22](B2)Under the above scenario, P_sgv is very low even in the case of extremely strong positive selection (P_sgv∼1% for s∼100%).

Let us now consider de novo resistant mutations that arise after the onset of pesticide application. The average time it takes for an adaptive mutation to emerge and to reach sufficiently high frequency, x∼1/(4N_es), assuring its escape from initial stochastic loss, the so-called establishment time T_e, is on the order of [34], [48](B3)Once established, the frequency trajectory x(t) of the adaptive mutation becomes essentially deterministic and can be modelled by [48](B4)From establishment it takes on the order of(B5)generations for the mutation to rise to intermediate population frequency (∼50%). The overall expected waiting time T_w for a de novo adaptive mutation to reach intermediate frequencies is then(B6)which is Equation (1) in the main text.

The probability of successful adaptation from de novo mutations depends on the expected waiting time for an adaptive mutation to emerge and to reach substantial frequencies in the population. This waiting time is the sum of the expected times to complete two distinct phases: (1) the establishment phase in which an adaptive mutation arises and reaches the frequency at which its escape from stochastic loss is assured and (2) the sweep phase in which the adaptive allele reaches an intermediate population frequency such that it can be readily observed. In Box 1 we show that the overall waiting time can be estimated as(1)

This equation implies that selection must already be very strong for a single 1-mutation allele to arise and to become prevalent in less than 1500 generations (s>20% for Θ = 0.01). Selection coefficients associated with the 2-mutation and 3-mutation alleles need to be even stronger given that they have to outcompete the 1-mutation and 2-mutation alleles respectively.

We have established that under this simple model if Θ is 0.01, the adaptation at Ace likely involved very strong positive selection acting on de novo mutations. Can we then explain the second empirical observation, namely that the same adaptive mutation by state is observed on several haplotypes that differ in the immediate vicinity of the adaptive site?

We can imagine two scenarios that would generate this observation. In the first, the so called hard sweep scenario, a single adaptive mutation arises in frequency in the population and eventually ends up on different haplotypes due to recombination or mutation events that take place in its vicinity during the sweep. In the other, an example of the so-called soft sweep scenario, several independent adaptive mutations take place on different haplotypes and increase in frequency simultaneously.

Theoretical investigations under simple scenarios by Pennings and Hermisson [20]–[22] showed that such soft sweeps are extremely uncommon if Θ per site is on the order of 0.01 independently of the strength of positive selection. The probability of the hard sweep scenario resulting in the observation of the haplotypic diversity in the vicinity of the adaptive allele is calculated in Box 2. Specifically, we demonstrate that the probability P_d that at least two haplotypes are observed at the end, where the minor haplotype is present in at least a fraction d of the population, is approximately(2)

Box 2. Probability of distinct haplotypes in a hard sweep. Consider a single adaptive mutation that reaches establishment frequency in generation t = 0. Its subsequent frequency trajectory is x₁(t). Mutated or recombined variants of its original haplotype become established in the population at rate(B7)Here R is the rate of either mutation or recombination taking place on the sweeping initial haplotype per individual per generation, and the factor 2s is the probability of an adaptive mutation to escape the initial stochastic loss. Note that this is an overestimate of the establishment probability of the second haplotype. If x₁ is substantial in frequency, establishment occurs with a probability that is closer to 2s(1−x₁).

What is the probability that a mutated or recombined haplotype also reaches at least frequency d in the population? Such a second haplotype has to emerge within a limited number T_d of generations after the first. Otherwise the population will already be dominated by the first haplotype, neutralizing any selective advantage of the second.

Let us assume that the second haplotype becomes established at time T_d. We denote its frequency trajectory by x₂(t). The crucial observation allowing us to calculate T_d is that the ratio x₁(t)/x₂(t) remains constant for all t≥T_d, as both haplotypes have the same fitness. In particular, because we require that the second haplotype is eventually present in a fraction d of the population, we have(B8)The latter approximation applies for small d≪1. At t = T_d, the trajectory x₁(t) can still be modelled by Equation (B4); no interference between different adaptive haplotypes has occurred until then because the second allele has been extremely rare or absent for t<T_d. Recalling that the second haplotype becomes established when it reaches a frequency x₂∼1/(4N_es), we thus have:(B9)Solving Equations (B8) and (B9) for T_d and assuming N_esd≫1 yields(B10)The condition N_esd≫1 is justified when positive selection is strong and d is large enough that there is a chance of sampling the second allele.

Mutations establishing after T_d can only reach a population frequency smaller than d. The probability of observing two different haplotypes with the minor haplotype being present in at least a small fraction d of the population can therefore be estimated from the probability that a new variant of the initial adaptive haplotype emerges within the first T_d generations. We obtain(13)where we again assumed N_esd≫1. Note that P_d does not depend on Θ.

Here R is the total rate of mutation or recombination in the locus per individual per generation and s is the strength of positive selection. For our locus of length ∼1500 bp we have R∼6*10⁻⁶ when assuming a recent estimate for the single-site mutation rate in D.melanogaster of m∼2.5*10⁻⁹ [23] and a measured recombination rate of ρ∼0.15 cM/Mbp [24]. The probability to observe different haplotypes is therefore still very small (P_d<1%) even for a low population frequency of d = 2% and assuming s to be 5%. Note that this calculation is very conservative given that in our data multiple haplotypes are present at much higher frequency than 2%, multiple haplotypes vary at sites extremely close to the adaptive allele (within 38 bp), and positive selection was likely much stronger.

In conclusion, under this simple scenario, our empirical observations at Ace are unexpected if Θ is indeed on the order of 0.01. Specifically, considering how strong selection must be, we should not be seeing more than one distinct haplotype containing the same adaptive mutation.

Note that if Θ were much higher, for example on the order of one or larger, then all of our observations are expected. Soft sweeps would be commonplace because many more mutations enter the population in every generation and can increase in frequency simultaneously thereby generating multiple haplotypes containing the same adaptive mutations [22], as observed in the data. The establishment time would become smaller making it easier to observe complex, 3-mutation alleles at Ace in less than 1500 generations. However, selection would still need to be strong because the time it takes for an adaptive allele to reach intermediate frequencies is only weakly (logarithmically) dependent on the effective population size and inversely proportional to the selection coefficient.

Patterns of evolution at Ace require large values of Θ: numerical investigations for a large range of evolutionary scenarios

We have shown above that under very simple population scenarios the pattern of adaptive evolution at Ace requires large values of Θ. However, it is unclear whether such large values of Θ are required under more complex and realistic scenarios. Variation in strength of selection, recombination rate, and population structure might affect the probability of evolving complex 3-mutation alleles from the simpler 1- or 2-mutation alleles [25] and the probability of observing multiple haplotypes containing the same adaptive mutations.

To investigate quantitatively the potential impact of such effects we conducted extensive simulations of adaptation at Ace under a large number of selective (s = 2.5% to 500%) and demographic scenarios (1 to 100 subpopulations, migration rates M = 0.01 to 10 individuals per generation between any two subpopulations), and with varying recombination rate (ρ = 0 to 10 cM/Mbp) (Table S3).

In Figure 3A and 3B we show the frequency trajectories of adaptive haplotypes for two representative simulation runs in a simple single population scenario together with summary statistics across a large number of runs for the two key Θ regimes (Θ = 0.01 and Θ = 1). We use four statistics: P_1m and P_3m are the probabilities that a single adaptive mutation (1m) allele or the 3-mutation allele (3m) were ever present in at least 10% of the population during the simulation; P_ss is the probability that a single adaptive mutation is present on distinct haplotypes in a sample of reasonable size (the observation that we will call the soft sweep signature from now on); and P_c is the combined probability of observing both the complex 3-mutation allele and a single-mutation soft sweep signature during the same simulation. Figure 3A and 3B show results consistent with our analytical considerations. When Θ∼0.01 and selection is of moderate strength, neither the evolution of complex 3-mutation alleles nor soft sweeps signatures are likely. Only when Θ approaches one do both observations become commonplace.

**Fig. 3. Population dynamics of resistance adaptation for different Θ regimes.**

Figure 3C shows the summary of the results for the more complex scenarios (complete results are shown in Table S3). In these more complex scenarios we assessed Θ by using coalescent simulations to estimate the average heterozygosity per site (Θ_π) at neutral sites and by summing Θ across all subpopulations (Θ_Σ) [26]. Our simulations confirm that only when both Θ_π and Θ_Σ become on the order of one or larger is it likely to observe fast evolution of complex 3-mutation alleles and at the same time soft sweep signatures. Strong selection does indeed improve the probability of seeing complex adaptive alleles but also, as expected, does not generate signatures of soft sweeps when Θ is small.

Interestingly our simulations show that if Θ∼1, then most of the observed signatures of soft sweeps are generated by multiple de novo mutations and are not due to the recombination of the same adaptive mutation onto different haplotypes. This is because signatures of soft sweeps are still commonly observed in simulations even when the recombination level is set at zero. It is also consistent with analytical considerations under simple scenarios (Text S1).

Discussion

Our data and analysis strongly suggest that the patterns of adaptation observed at Ace in the last 1000–1500 generations are highly unlikely in a population in which Θ per site is on the order of 0.01 as it is commonly assumed. Instead, it appears that Θ per site must have been at least 0.1 and more likely on the order of one or larger. It is possible to elevate Θ by increasing the mutation rate or by increasing the effective population size. We assessed whether Ace had an unusually high mutation rate by estimating divergence of Ace in D. melanogaster from its D. simulans ortholog at synonymous sites. We found the divergence to be 7.9%, which is similar to the genome average of ∼10% [27], [28]. In addition, Θ_π per site estimated from polymorphisms at synonymous sites in sensitive alleles is 0.008, which is also consistent with the genome average [18]. Thus we conclude that the effective population size in D. melanogaster over the past 1000–1500 generations is likely to be very large (N_e≥10⁸).

Such a large value of N_e might appear puzzling given that levels of standing neutral polymorphism suggest that N_e is much smaller [18], [19]. To resolve this discrepancy it is necessary to take a closer look at the concept of an effective population size. Effective population size is commonly defined by the inverse magnitude of the frequency-fluctuations of a neutral allele in two consecutive generations [1]. Over a number of generations, effective population size is the harmonic mean of the effective population sizes over individual generations and thus is dominated by the smallest values of N_e. (Equivalently, frequency fluctuations over many generations are dominated by the largest fluctuations over single generations). Estimates of the effective population size using frequent neutral polymorphisms reflect N_e harmonically averaged over long periods of time and are therefore very sensitive to any periods of low population size even far back into the past [29].

In sharp contrast, adaptation at Ace occurred within less than 1500 generations. The N_e relevant to adaptation at Ace is the harmonic mean of N_e values over the past 1500 generations or even fewer. Unlike N_e measured from ancient standing variation, it is not reduced by the bottlenecks and nearby selective sweeps that occurred more than 1500 generations ago. Consider a simple bottleneck scenario outlined in Figure 4 that is similar to the out-of-Africa scenario of Thornton and Andolfatto [18]. It is apparent that even if the current N_e is 100-fold larger than commonly assumed, population behaviour of a frequent neutral allele does not change substantially and the estimates of Θ from standing variation are not altered. To give another example, if D. melanogaster populations were to spend 90% of their time with N_e of 10¹⁰ and 10% at N_e of 10⁵ with the shifts occurring about every 1000 generations, the harmonic mean N_e derived from common neutral polymorphisms would be ∼10⁶ and yet the adaptive process would take place primarily in populations of 10¹⁰ with Θ>1 per site. In this case, strong adaptation in Drosophila would not be limited by mutation most of the time.

**Fig. 4. Population dynamics of neutral and adaptive alleles in a population with a bottleneck.**

The short-term N_e is bounded by the census population size (N) and thus if N is much smaller than the reciprocal of the mutation rate per site we can be certain that adaptation would be mutation-limited. In many species N can be much larger than the reciprocal of mutation rate and thus in these species it is possible that adaptation is not limited by mutation at single sites. However, it is N_e measured over time scales relevant for adaptation and not N that needs to be assessed to answer this question. Even short-term N_e might be much smaller than N if populations crash regularly on very fast temporal scales (such as those induced by winters in temperate climates) or if the numbers of successfully reproducing adults in each generation is sharply limited by extrinsic factors, for example by available substrates for laying eggs. Thus the studies of strong adaptation, such as the one presented here, are essential to determining whether adaptation in general is mutation-limited in a species.

It is reasonable that Drosophila and many other organisms undergo recurrent boom-bust cycles thereby reducing the long-term N_e strongly but allowing adaptation during the boom years to occur in populations of large short-term N_e. In addition, Drosophila appears to undergo pervasive adaptation [30], [31] with most common neutral polymorphisms estimated to have been affected by several selective sweeps in their genomic vicinity [28]. Such pervasive adaptation generates dynamics similar to recurrent bottlenecks and will also reduce the long-term N_e values even if the short-term N_e might be consistently large. This situation is similar to that found in HIV, where the effective population size estimated from observed diversity underestimates the census size by many orders of magnitude and is likely to underestimate the short-term N_e relevant for adaptation as well [32].

The possibility that adaptation at single sites in D. melanogaster is not limited by mutation has profound implications. The distinction between standing variation and de novo mutations at single sites is blurred since virtually all single-site mutations then exist in the D. melanogaster population at any given time. Strong adaptation should be much more rapid and generally result in soft sweeps. Complex adaptations that require multiple changes can be generated without fixation of interim states and with an enhanced chance of crossing fitness valleys [33]. This raises the question of whether the widespread use of the weak mutation, strong selection (“WMSS”) model for the study of adaptation should be broadened to include cases of strong mutation [34], [35].

The number of sweeps (hard or soft) might also in general be lower than the number of adaptive substitutions if complex adaptations requiring multiple substitutions are common. Indeed, in our simulations of evolution at Ace in the strong mutation regime (Θ per site on the order of 1), the complex 3-mutation alleles generally evolve without fixation of intermediate 1- and 2-mutation alleles (Figure 3). The number of adaptive substitutions estimated using McDonald-Kreitman approaches should then be larger than the number of independent adaptive fixations and the prediction of the number of selective sweeps derived from the number of adaptive substitutions should be upwardly biased [36].

Note that all of these expectations hold especially well for strong selection because it operates over shorter time scales and is therefore less sensitive to recurrent but infrequent bottlenecks [37] and neighbouring selective sweeps.

Most of the current statistical approaches for the study of adaptation rely on the expected signatures of hard sweeps [30]. Such methods should regularly miss or misidentify strong adaptation if it in fact commonly involves soft sweeps as in the case of Ace [20]. For example, if one searches exclusively for hard sweeps, then complete soft sweeps might appear as ongoing hard sweeps and the polymorphisms associated with the most frequent haplotype would appear as the likeliest candidates for the adaptive mutation whereas the true adaptive mutation would be fixed in the population. Methods exist that have high power to detect soft sweeps [20], but they are used less often because soft sweeps have been considered unlikely a priori. However, a number of cases of adaptation in Drosophila and mosquitoes show clear signatures of soft sweeps [38]–[40]. Soft sweeps might also be common in humans, with the soft sweep associated with lactase persistence providing the strongest signature of adaptation in humans [41], [42]. Our results suggest that the possibility of pervasive soft sweeps needs to be taken seriously.

Recurrent boom-bust cycles are a general feature in population dynamics of most studied organisms. Adaptation and recurrent selective sweeps reducing the long-term but not the short-term N_e might also be common. It follows then that short-term and long-term N_e values are likely to be different as a rule. The shortest term N_e is only bounded by the census population size, which is often very large and can easily be in the billions, particularly for insects or marine organisms. It is thus possible that strong adaptation at single sites may not be limited by mutation in many eukaryotes, similar to the situation found in bacteria and viruses [32].

Materials and Methods

Ace locus genotyping

We sequenced 1450 bp encompassing exons 2 through 4 of Ace. Resistant mutations I161V and G265A lie in the 3rd exon while F330Y and G368A lie in the 4th exon (Figure S1). Initially we sequenced this locus in 68 strains from 20 populations chosen to represent the Ace locus in a variety of geographical locations. The list of the populations and the number of lines investigated are given in Table S1 and Table S2. For some of the strains that appeared heterozygous after sequencing of the PCR product, the DNA was first amplified using a proofreading DNA polymerase (Platinum Pfx; INVITROGEN) and cloned using Zero Blunt TOPO PCR cloning kit (INVITROGEN) before sequencing. Note that not all heterozygous strains were cloned, only those that contained a resistant mutation and the AF strains. The primers used for PCR amplification of the Ace locus were:

Ace1F: gctggttagtttgccgtaat
Ace1R: ccatgatatccgcattgtaga
Ace2F: aatccgcagaacacgaccaac
Ace2R: cgtgagcgggattggtct
Ace3F: gccttaacgcgtcactcac
Ace3R: aagcttggcaaacaacattgg

PCR products were then sequenced. Of the 68 sequenced strains, 26/68 (∼40%) have a single or multiple resistant mutations. Mutations at I161V, G265A and F330Y were identified in isolation and in combination in multiple populations, while G368A was never observed. We then used PASA [43] to identify strains that contained one or more of the three observed mutations and sequenced the identified strains. The primers used for PASA were:

161-F: ccggatcggccaccctggaca
161-R: agtcgttgatcagcgccttgc
265-F: gcgcggaatgatgcagtcggg
265-R: atcaatggtgggcgccgagg
330-F: gaagaggcgcccggcaatgtg
330-R: atggtgggcgccgagggata

The 161 primer pair amplifies more effectively in the presence of the mutation I161V. The 265 primer pair is specific to G265A and the 330 primer pair is specific to G330Y. The annealing temperatures required for allele specific priming used for 161, 265 and 330 were 61.5°C, 59.5°C and 60.6°C respectively. As positive and negative controls we performed PASA on strains in which the resistant sites had been previously characterized. We sequenced 37 strains from 8 populations that had amplified with one or more of the allele-specific primers. 31/37 (84%) of these strains contained resistant mutations. The incorrect classification of the 6 strains is likely due to the addition of excess template to these PCR reactions resulting in non-specific priming. In total, we sequenced the Ace locus in 105 strains from 27 populations from five different continents (Table S1). Twelve of these strains were excluded from the analysis due to poor sequence quality.

Construction of haplotype network

The most parsimonious haplotype network was constructed using TCS 1.21 [44]. All resistant alleles, except those for which we had poor sequence data, and all sensitive alleles observed more than once were used for the construction of the network. All AF strains and M strains were also included in the network to provide information on ancestral and modern variation respectively at the Ace locus.

Estimation of Θ_π and divergence

Measures of Θ_π and divergence with Drosophila simulans at the Ace locus were obtained using DnaSP [45]. All sensitive strains analyzed in this study were used for the estimation.

Forward simulations of Ace adaptation

Our simulation models the population frequency dynamics of haplotypes at the 1.5 kb-long sequenced Ace locus and incorporates mutation, recombination, selection, and population substructure.

Haplotypes are classified by their particular adaptive allele configuration at the three adaptive sites. We describe this configuration in terms of a vector a₁a₂a₃, indicating whether at site i the resistance-conferring mutation is present (a_i = 1) or not (a_i = 0). A configuration 101, for example, specifies resistant mutations at sites one and three, but no resistant mutation at site two.

We use an infinite alleles model for new haplotypes, i.e. every mutation or recombination event at the locus is assumed to give rise to a new haplotype, which can be distinguished from all other haplotypes in the population. This is implemented in our simulations by assigning a unique ID to every new haplotype. The specific nucleotide sequence of the new haplotype is not relevant for our purposes; only changes in the adaptive-allele configuration are modelled explicitly. We also do not distinguish different sensitive haplotypes as we focus on the population dynamics of adaptive haplotypes. These simplifications substantially increase the performance of our simulations, allowing us to investigate scenarios with population sizes up to 10⁹ in reasonable run-time.

Mutations at adaptive sites and recombination events where the recombination breakpoint lies between two adaptive sites can generate new haplotypes with different adaptive-allele configuration (Table S4). Note that at each site only one specific nucleotide is the resistant allele and thus only one out of three mutations of a sensitive allele will give rise to it.

The evolution of haplotype frequencies is simulated in terms of a Wright-Fisher model with directional selection, i.e. we assume panmictic subpopulations of constant size and non-overlapping generations [46]. Every haplotype h has a specific selection coefficient s(h). The mean fitness of a subpopulation at time t is , where x_h(t) is the frequency of haplotype h in the subpopulation at time t. Haplotype frequencies in generation t+1 are obtained by sampling from a multinomial distribution B(2N,{p_h}) with selection-adjusted probabilities .

We group resistant haplotypes into three classes according to the number of resistance-conferring mutations they bear: 1m haplotypes have one resistant allele (100,010,001), 2m haplotypes have two (011,101,110), and 3m haplotypes have all three resistant alleles (111). For simplicity, we assume that all haplotypes in the same class have equal selection coefficients s_1m, s_2m, and s_3m, respectively. Prior to pesticide application all resistant haplotypes are modelled to be deleterious with selection coefficient −s_1m.

The key simulation parameters are the selection scenario defined by the selection coefficients s_1m, s_2m, and s_3m, the recombination rate ρ, the number n of subpopulations, the migration rate M between subpopulations, and the value of Θ within subpopulations. We use a constant mutation rate of μ = 2.5 * 10⁻⁹ per site per generation [23]. Different Θ-values thus correspond to different subpopulation sizes. In particular, Θ = 0.01 corresponds to N = 10⁶, and Θ = 1.0 corresponds to N = 10⁸. We estimated a recombination rate of ρ = 0.15 cM/Mbp for our locus [24], but investigate also other recombination rates in our simulations.

Simulation runs start with one single sensitive haplotype present in all subpopulations at 100% frequency. Before pesticide application commences, mutation-selection equilibrium of resistant haplotypes is established within a burn-in period of 1000 generations. This fully suffices to establish equilibrium due to the strong purifying selection against all resistant haplotypes prior to pesticide application (Box 1). We also verified that longer burn-in times do not change our results. After the burn-in period, pesticide application starts by switching to the corresponding selection scheme. The simulation is then followed for another 1500 generations representing approximately 50 years of pesticide usage. During every generation individual subpopulations evolve according to the following steps:

A random number of mutation events is drawn from a Poisson distribution with mean μ * 1.5 kb * 2N. For each mutation a random haplotype is drawn from the subpopulation and mutated at a randomly chosen position.
A random number of recombination events is drawn from a Poisson distribution with mean ρ * 10⁻⁸ * 1.5 kb * 2N. For each recombination event two random haplotypes are drawn from the subpopulation and recombined at a randomly chosen breakpoint.
The numbers of migrating individuals to each other subpopulation are drawn from a Poisson distribution with mean M. For each migrating individual two random haplotypes are drawn from the source population and added to the destination subpopulation.
All haplotype frequencies are evolved one generation according to the above-described binomial sampling procedure.

During a simulation run we analyze whether resistant haplotypes emerged and whether soft sweep signatures among 1m haplotypes were observed. We define 1m resistance by at least one of the three 1m adaptive-allele configurations (001, 010, or 100) ever being present in more than 10% of the population during the run. Accordingly, 3m resistance is defined by the complex 3-mutation allele (111) ever present in at least 10% of the population. A soft sweep signature (ss) is ascertained if at any time during the run two independently drawn alleles have greater than 10% probability to bear the same 1m configuration on different haplotypes. The statistics P_1m, P_3m, and P_ss are the respective probabilities averaged over many runs. P_c denotes the combined probability that 3m resistance emerged and a soft sweep signature was observed during the same run.

A crucial assumption of our simulation is the applicability of an infinite alleles model, i.e. all mutation or recombination events are assumed to be detectable. This can lead to an overestimation of the probabilities to observe soft sweep signatures in our simulations if independent mutation events frequently occur on the same haplotype, or if newly recombined haplotypes often resemble haplotypes already present in the population. We can estimate the resulting error from the probability that an individual is homozygous for the 1.5 kb-long locus. From coalescent simulations using ms [26] we infer it to be on the order of ∼10% when assuming a per site heterozygosity of out-of-Africa D. melanogaster subpopulations of Θ_π∼0.5% [18], [19] and the above specified recombination and mutation rates for our locus. Note, however, that in any case the infinite alleles model can only lead to an overestimation of the probability to observe soft sweep signatures. It is therefore always conservative in terms of our analysis. The probabilities P_1m and P_3m are not affected by the choice of an infinite alleles model.

The simulation was implemented in C++. Runs were performed on the Bio-X2 cluster at Stanford University. All source code is available from the authors upon request.

Supporting Information

Zdroje

1. CharlesworthB

2009 Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat Rev Genet 10 195 205

2. ZhuKY

LeeSH

ClarkJM

1996 A Point Mutation of Acetylcholinesterase Associated with Azinphosmethyl Resistance and Reduced Fitness in Colorado Potato Beetle. Pestic Biochem Physiol 55 100 108

3. AnazawaY

TomitaT

AikiY

KozakiT

KonoY

2003 Sequence of a cDNA encoding acetylcholinesterase from susceptible and resistant two-spotted spider mite, Tetranychus urticae. Insect Biochem Mol Biol 33 509 514

4. NabeshimaT

MoriA

KozakiT

IwataY

HidohO

2004 An amino acid substitution attributable to insecticide-insensitivity of acetylcholinesterase in a Japanese encephalitis vector mosquito, Culex tritaeniorhynchus. Biochem Biophys Res Commun 313 794 801

5. MuteroA

PralavorioM

BrideJM

FournierD

1994 Resistance-associated point mutations in insecticide-insensitive acetylcholinesterase. Proc Natl Acad Sci U S A 91 5922 5926

6. VontasJG

HejaziMJ

HawkesNJ

CosmidisN

LoukasM

2002 Resistance-associated point mutations of organophosphate insensitive acetylcholinesterase, in the olive fruit fly Bactrocera oleae. Insect Mol Biol 11 329 336

7. WalshSB

DoldenTA

MooresGD

KristensenM

LewisT

2001 Identification and characterization of mutations in housefly (Musca domestica) acetylcholinesterase involved in insecticide resistance. Biochem J 359 175 181

8. AminetzachYT

MacphersonJM

PetrovDA

2005 Pesticide resistance via transposition-mediated adaptive gene truncation in Drosophila. Science 309 764 767

9. ColosimoPF

HosemannKE

BalabhadraS

VillarrealGJr

DicksonM

2005 Widespread parallel evolution in sticklebacks by repeated fixation of Ectodysplasin alleles. Science 307 1928 1933

10. AldridgeWN

1950 Some properties of specific cholinesterase with particular reference to the mechanism of inhibition by diethyl p-nitrophenyl thiophosphate (E 605) and analogues. Biochem J 46 451 460

11. SmissaertHR

1964 Cholinesterase Inhibition in Spider Mites Susceptible and Resistant to Organophosphate. Science 143 129 131

12. MenozziP

ShiMA

LougarreA

TangZH

FournierD

2004 Mutations of acetylcholinesterase which confer insecticide resistance in Drosophila melanogaster populations. BMC Evol Biol 4 4

13. ShiMA

LougarreA

AliesC

FremauxI

TangZH

2004 Acetylcholinesterase alterations reveal the fitness cost of mutations conferring insecticide resistance. BMC Evol Biol 4 5

14. MiyoT

OgumaY

2002 Negative correlations between resistance to three organophosphate insecticides and productivity within a natural population of Drosophila melanogaster (Diptera: Drosophilidae). J Econ Entomol 95 1229 1238

15. DavidJR

CapyP

1988 Genetic variation of Drosophila melanogaster natural populations. Trends Genet 4 106 111

16. KidwellMG

1983 Evolution of hybrid dysgenesis determinants in Drosophila melanogaster. Proc Natl Acad Sci U S A 80 1655 1659

17. AshburnerM

1989 Drosophila: A Laboratory Handbook and Manual New York Cold Spring Harbor Laboratory Press

18. ThorntonK

AndolfattoP

2006 Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster. Genetics 172 1607 1619

19. LiH

StephanW

2006 Inferring the demographic history and rate of adaptive substitution in Drosophila. PLoS Genet 2 e166 doi:10.1371/journal.pgen.0020166

20. PenningsPS

HermissonJ

2006 Soft sweeps III: the signature of positive selection from recurrent mutation. PLoS Genet 2 e186 10.1371/journal.pgen.0020186

21. PenningsPS

HermissonJ

2006 Soft sweeps II–molecular population genetics of adaptation from recurrent mutation or migration. Mol Biol Evol 23 1076 1084

22. HermissonJ

PenningsPS

2005 Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics 169 2335 2352

23. KeightleyPD

TrivediU

ThomsonM

OliverF

KumarS

2009 Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines. Genome Res 19 1195 1201

24. Fiston-LavierA-S

SinghND

LipatovM

PetrovDA

2010 Drosophila melanogaster recombination rate calculator Gene, (in press)

25. WattWB

1972 Intragenic Recombination as a Source of Population Genetic Variability. American Naturalist 106 737 753

26. HudsonRR

2002 Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18 337 338

27. BegunDJ

HollowayAK

StevensK

HillierLW

PohYP

2007 Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol 5 e310 doi:10.1371/journal.pbio.0050310

28. MacphersonJM

SellaG

DavisJC

PetrovDA

2007 Genomewide spatial correspondence between nonsynonymous divergence and neutral polymorphism reveals extensive adaptation in Drosophila. Genetics 177 2083 2099

29. LewontinRC

1974 The genetic basis of evolutionary change New York Columbia University Press

30. SellaG

PetrovDA

PrzeworskiM

AndolfattoP

2009 Pervasive natural selection in the Drosophila genome? PLoS Genet 5 e1000495 doi:10.1371/journal.pgen.1000495

31. GonzalezJ

LenkovK

LipatovM

MacphersonJM

PetrovDA

2008 High rate of recent transposable element-induced adaptation in Drosophila melanogaster. PLoS Biol 6 e251 doi:10.1371/journal.pbio.0060251

32. KouyosRD

AlthausCL

BonhoefferS

2006 Stochastic or deterministic: what is the effective population size of HIV-1? Trends Microbiol 14 507 511

33. WeissmanDB

DesaiMM

FisherDS

FeldmanMW

2009 The rate at which asexual populations cross fitness valleys. Theor Popul Biol 75 286 300

34. GillespieJH

1991 The Causes of Molecular Evolution New York Oxford University Press

35. WeinreichDM

DelaneyNF

DepristoMA

HartlDL

2006 Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312 111 114

36. AndolfattoP

2007 Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome. Genome Res 17 1755 1762

37. OttoSP

WhitlockMC

1997 The probability of fixation in populations of changing size. Genetics 146 723 733

38. JeongS

RebeizM

AndolfattoP

WernerT

TrueJ

2008 The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell 132 783 793

39. SchlenkeTA

BegunDJ

2005 Linkage disequilibrium and recent selection at three immunity receptor loci in Drosophila simulans. Genetics 169 2013 2022

40. LabbeP

BerthomieuA

BerticatC

AloutH

RaymondM

2007 Independent duplications of the acetylcholinesterase gene conferring insecticide resistance in the mosquito Culex pipiens. Mol Biol Evol 24 1056 1067

41. EnattahNS

TrudeauA

PimenoffV

MaiuriL

AuricchioS

2007 Evidence of still-ongoing convergence evolution of the lactase persistence T-13910 alleles in humans. Am J Hum Genet 81 615 625

42. CoopG

PickrellJK

NovembreJ

KudaravalliS

LiJ

2009 The role of geography in human adaptation. PLoS Genet 5 e1000500 doi:10.1371/journal.pgen.1000500