Catching the drift of experimental evolution.

Blog author Doc Edge is a graduate student in Noah Rosenberg's lab.

Blog author Doc Edge is a graduate student in the Rosenberg lab.

One picture of science, popular in dramatic depictions of scientific history, shows an isolated theorist working out the implications of a bold idea. In this picture, the theorist relies, somewhat indifferently, on the ingenuity of an unknown, unspecified experimentalist who will someday—perhaps many years later—test the theorist’s ideas against unforgiving reality. It’s trite to point out that this picture isn’t a good representation of actual scientific practice. The relationship between theory and experiment both is and should be more bidirectional, more collaborative, and more nuanced than this, as philosophers of science, sociologists of science, and scientists themselves have pointed out for decades. A recent paper by CEHG graduate student Arbel Harpak (Pritchard lab) and Guy Sella (Stanford PhD 2001 from Marc Feldman’s group, in the less-sunny, pre-CEHG days) is a lesson in how experimentally-motivated theoretical work can inform both future theory and future experiment.

(Neutral) Evolution in a Test Tube

Harpak and Sella take up one of the most interesting contexts for the study of evolution: the serial-transfer experiment. In serial-transfer experiments, microorganisms (usually E. coli or yeast) are allowed to divide in an isolated container with some nutrients. Once the population in a container reaches a pre-specified size (which Harpak and Sella call N2), a sample of a pre-specified size (N1) is taken from the original container and transferred to a new one. This process of growth, sampling, and transfer is then repeated many times.

A schematic of a serial transfer experiment (from Sprouffske et al., 2012) [Note: in a previous version this image was mistakenly attributed to A. Harpak.]

Both experimenters and theorists have focused on understanding selection in serial-transfer experiments. This is sensible: the serial-transfer scenario allows researchers to manipulate the parameters that influence adaptation—population size, selective pressure, etc.—and to study the dynamics of adaptation. Harpak and Sella start from a premise that has been less widely-appreciated in the serial-transfer context: namely, that the other forces of evolution, including drift and demography, are also active in serial-transfer contexts. These selectively-neutral forces are ever-present, running in the background of experiments designed to study selection. It is not that Harpak and Sella claim that neutral forces are as or more important than selective forces in a typical serial-founder experiment—they acknowledge that selective forces likely predominate. But they rightly point out that we cannot fully understand what selection does without understanding the context in which it acts. Population geneticists have known this for a long time, but until now, they have not formally applied this insight to serial-transfer experiments.

Harpak and Sella develop a fully-articulated model for neutral evolution in a serial-transfer context. In population-genetic terms, the outcome of a serial-transfer experiment is the product of repeated periods of growth followed by bottlenecks. They build on the intuitions that population geneticists already have about population growth and bottlenecks to sketch a picture of what serial transfers will do to measurements of genetic diversity. Diversity slowly builds up as the population grows and as more distinct lineages appear in the population. But when a small subset of the population is removed and transferred to a new container, only the diversity that is represented in that small subset will be present in the next phase of the population’s growth. When the experiment involves many cycles of serial transfer, the genetic diversity ends up looking like that of a population with a constant size of N1*log2(N2/N1), where N1 is the number of organisms transferred during each cycle and N2 is the size to which the population grows just before the end of the cycle. This is a neat result—as has been found in other contexts, the genetic diversity is more strongly influenced by extremity of the bottleneck than it is by the maximum size to which the population grows.

Relaxing the Assumptions

This result agrees with population geneticists’ intuitions about demographic changes and genetic diversity. Nonetheless, there are two reasons why it is unrealistic for actual serial-transfer experiments. First, most serial-transfer experiments are not long enough to reach the predicted equilibrium state. Second, the equilibrium prediction assumes that all the microorganisms in the sample divide at the same time, but in reality, there is variation in the length of time that a cell persists before dividing. Harpak and Sella adjust their model to deal with both of these issues. In considering the shorter time scales typical of serial-transfer experiments, they find that the shorter the experiment is, the more the genetic diversity will be dominated by rare variation—genetic differences seen only in a single individual, for example. They also derive results for another version of their model in which the length of time to the next cell division is random. When cells vary in how long they take to divide, the average number of differences between any two cells decreases. This is because cells that divide quickly are likely to leave more offspring in subsequent generations. This means that any two cells in a subsequent generation are more likely to be recently related, since there is an increased chance that they both descend from some quickly-dividing recent ancestor.

Harpak and Sella’s paper is a step toward a fully-realized population-genetic framework for serial transfer experiments and for experimental evolution more generally. We have long known that models of evolution have to account for selectively-neutral forces in addition to selection, and it is only sensible to think of evolutionary experiments in the same way. By responding to a frequently-used experimental setup, Harpak and Sella have done a service for experimental evolution, providing a better understanding of the selectively neutral processes that act in serial transfer experiments. At the same time, Harpak and Sella perform a service that is perhaps even more important for population-genetic theory. Evolutionary experiments are an unprecedented opportunity to learn about the actions of evolutionary forces at a large scale. Moreover, experiments offer researchers the power to control the setting in which evolution takes place and to repeat evolutionary processes. Population-genetic theorists, who until recently have had to make do with the single iteration of evolution that nature has provided, ought to jump at the chance to test their ideas in real-time. Harpak and Sella’s paper is an example of how theorists attending closely to experimental methods can lay the ground for improved experiments and improved theories.


Harpak A, Sella G (2014) Neutral null models for diversity in serial transfer evolution experiments. Evolution. DOI: 10.1111/evo.12454

Sprouffske K, Merlo LM., Gerrish PJ, Maley CC, & Sniegowski PD (2012). Cancer in light of experimental evolution. Current Biology22(17), R762-R771, DOI: 10.1016/j.cub.2012.06.065.

Paper author Arbel Harpak is a graduate student in the Pritchard lab.

Paper author Arbel Harpak is a graduate student in the Pritchard lab.

A framework for identifying and quantifying fitness effects across loci

Blog author Ethan Jewett is a PhD student in the lab of Noah Rosenberg.

Blog author Ethan Jewett is a PhD student in the lab of Noah Rosenberg.

The degree to which similarities and differences among species are the result of natural selection, rather than genetic drift, is a major question in population genetics. Related questions include: what fraction of sites in the genome of a species are affected by selection? What is the distribution of the strength of selection across genomic sites, and how have selective pressures changed over time? To address these questions, we must be able to accurately identify sites in a genome that are under selection and quantify the selective pressures that act on them.

Difficulties with existing approaches for quantifying fitness effects    

A recent paper in Trends in Genetics by David Lawrie and Dmitri Petrov (Lawrie and Petrov, 2014) provides intuition about the power of existing methods for identifying genomic regions affected by purifying selection and for quantifying the selective pressures at different sites. The paper proposes a new framework for quantifying the distribution of fitness effects across a genome. This new framework is a synthesis of two existing forms of analysis – comparative genomic analyses to identify genomic regions in which the level of divergence among two or more species is smaller than expected, and analyses of the distribution of the frequencies of polymorphisms (the site frequency spectrum, or SFS) within a single species (Figure 1). Using simulations and heuristic arguments, Lawrie and Petrov demonstrate that these two forms of analysis can be combined into a framework for quantifying selective pressures that has greater power to identify selected regions and to quantify selective strengths than either approach has on its own.

Figure 1. Using the quantify the strength of purging selection. The SFS tabulates the number of polymorphisms at a given frequency in a sample of haplotypes. Under neutrality (black dots) many high-frequency polymorphisms are observed. Under purifying selection (higher values of the effective selection strength |4Nes|), a higher fraction of new mutations are deleterious, leading to fewer high-frequency polymorphisms (red and blue dots). Adapted from Lawrie and Petrov (2014).

Figure 1. Using the site frequency spectrum (SFS) to quantify the strength of purifying selection. The SFS tabulates the number of polymorphisms at a given frequency in a sample of haplotypes. Under neutrality (black dots) many high-frequency polymorphisms are observed. Under purifying selection (higher values of the effective selection strength |4Nes|), a higher fraction of new mutations are deleterious, leading to fewer high-frequency polymorphisms (red and blue dots). Adapted from Lawrie and Petrov (2014).

Lawrie and Petrov begin by discussing the strengths and weaknesses of the two existing approaches. Comparative analyses of genomic divergence are beneficial for identifying genomic regions under purifying selection, which will exhibit lower-than-expected levels of divergence among species. However, as Lawrie and Petrov note, it can be difficult to use comparative analyses to quantify the strength of selection in a region because even mild purifying selection can result in complete conservation among species within the region (Figure 2). For example, whether the population-scaled selective strength, 4Nes, in a region is 20 or 200, the same genomic signal will be observed, complete conservation.

Figure 1. Adapted from Lawrie and Petrov (2013). The evolution of several 100kb regions was simulated in 32 different mammalian species under varying strengths of selection |4Nes|. The number of substitutions in each region was then estimated using genomic evolutionary rate profiling (GERP). The plot shows the median across regions of the number of inferred substitutions. From the plot, it can be seen that, once the strength of selection exceeds a weak threshold value (3 for the example given), there is full conservation among species.

Figure 1. Adapted from Lawrie and Petrov (2013). The evolution of several 100kb regions was simulated in 32 different mammalian species under varying strengths of selection |4Nes|. The number of substitutions in each region was then estimated using genomic evolutionary rate profiling (GERP). The plot shows the median across regions of the number of inferred substitutions. From the plot, it can be seen that, once the strength of selection exceeds a weak threshold value (3 for the example given), there is full conservation among species.

In contrast to comparative approaches, analyses of within-species polymorphisms based on the site frequency spectrum (SFS) within a region can be used to more precisely quantify the strength of selection. For example, Figure 1 shows that different selective strengths can produce very different site frequency spectra. Moreover, if the SFS can be estimated precisely enough, it can allow us to distinguish between two different selective strengths (e.g., 4Nes1 = 20 and 4Nes2 = 200) that would both lead to total conservation in a comparative study, and would therefore be indistinguishable. The problem is that it takes a lot of polymorphisms to obtain an accurate estimate of the SFS, and a genomic region of interest may contain too few polymorphisms, especially if the region is under purifying selection, which decreases the apparent mutation rate. Sampling additional individuals from the same species may provide little additional information about the SFS because few novel polymorphisms may be observed in the additional sample. For example, recall that for a sample of n individuals from a wildly idealized panmictic species, the expected number of novel polymorphisms observed in the n+1st sampled individual is proportional to 1/n (Watterson1975).

A proposed paradigm

Lawrie and Petrov demonstrate that studying polymorphisms by sampling many individuals across several related species (rather than sampling more individuals within a single species) could increase the observed number of polymorphisms in a region, and therefore, could increase the power to quantify the strength of selection (Figure 3) – as long as the selective forces in the genomic region are sufficiently similar across the different species.


Figure 3. The benefits of studying polymorphisms in many populations, rather than within a single population. Three populations (A, B, and C) diverge from an ancestral population, D. The genealogy of a single region is shown (slanted lines) with mutations in the region denoted by orange slashes. Additional lineages sampled in population A are likely to coalesce recently with other lineages (for example, the red clade in population A ) and, therefore, carry few mutations that have not already been observed in the sample. In comparison, the same number of lineages sampled from a second population are likely to carry additional independent polymorphisms (for example, the red lineages in population B). If the selective pressures at the locus in populations A and B are similar, then the SFS in the two populations should be similar, and the additional lineages in B can provide additional information about the SFS. For example, if the demographic histories and selective pressures at the locus are identical in populations A and B, and if the samples from populations A and B are sufficiently diverged, then a sample of K lineages from each population, A and B, will contain double the number of independent polymorphisms that are observed in a sample of K lineages from population A alone, providing double the number of mutations that can be used to estimate the SFS.

The need for sampling depth and breadth

Without getting bogged down in the details, it’s the rare variants that are often the most important for quantifying the effects of purifying selection, so one still has to sample deeply within each species; however, overall, sampling from additional species is a more efficient way of increasing the absolute number of variants that can be used to estimate the SFS in a region, compared with sampling more deeply within the same species.

The simulations and heuristic arguments presented by Lawrie and Petrov consider idealized cases for simplicity; however, the usefulness of approaches that consider polymorphisms across multiple species has been demonstrated in methods such as the McDonald-Kreitman test (McDonald and Kreitman, 1991), which have long been important tools for studying selection. More recent empirical applications of approaches that consider information about polymorphisms across multiple species appear to do a good job of quantifying selective pressures across genomes (Wilson et al., 2011; Gronau et al., 2013), even when species are closely related (De Maio et al., 2013). Overall, the simulations and arguments presented in Lawrie and Petrov’s paper provide useful guidelines for researchers interested in identifying and quantifying selective forces, and their recommendation to sample deeply within species and broadly across many species comes at a time when such analyses are becoming increasingly practical, given the recent availability of sequencing data from many species.


  1. De Maio, N., Schlötterer, C., and Kosiol, C. (2013). Linking great apes genome evolution across time scales using polymorphism-aware phylogenetic models. Molecular biology and evolution30:2249-2262.
  2. Gronau, I., Arbiza, L., Mohammed, J., and Siepel, A. (2013). Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. Molecular biology and evolution30:1159-1171.
  3. Lawrie, D.S. and Petrov, D.A. (2014). Comparative population genomics: power and principles for the inference of functionality. Trends in Genetics30:133-139.
  4. Watterson, G.A. (1975). On the number of segregating sites in genetical models without recombination. Theoretical population biology7:256-276.
  5. Wilson, D.J., Hernandez, R.D., Andolfatto, P., and Przeworski, M. (2011). A population genetics-phylogenetics approach to inferring natural selection in coding sequences. PLoS genetics7:e1002395.

Paper author: David Lawrie was a graduate student in Dmitri Petrov’s lab. He is now a postdoc at USC.




Taking studies of regulatory evolution to the next level: translation

Carlo Artieri, a postdoc in the group of Hunter Fraser, wrote this blog post. The paper is written by Carlo and Hunter.

Carlo Artieri, a postdoc in the group of Hunter Fraser, wrote this blog post. The paper is written by Carlo and Hunter.

Carlo Artieri writes about his new paper: Evolution at two levels of gene expression in yeast which is in press in Genome Research.

Understanding the molecular basis of regulatory variation within and between species has become a major focus of modern genetics. For instance, the majority of identified human disease-risk alleles lie in non-coding regions of the genome, suggesting that they affect gene regulation (Epstein 2009). Furthermore, it has been argued that regulatory changes have played a dominant role in explaining uniquely human attributes (King and Wilson 1975). However, our knowledge of gene regulatory evolution is based almost entirely on studies of mRNA levels, despite both the greater functional importance of protein abundance, and evidence that post-transcriptional regulation is pervasive. The availability of high-throughput methods for measuring mRNA abundance, coupled to the lack of comparable methods at the protein level have contributed to this focus; however, a new method known as ribosome profiling, or ‘riboprofiling’ (Ingolia et al. 2009), has enabled us to study the evolution of translation in much greater detail than was possible before. This method involves the construction of two RNA-seq libraries: one measuring mRNA abundance (the ‘mRNA’ fraction), and the second capturing the portion of the transcriptome that is actively being translated by ribosomes (the ‘Ribo’ fraction). On average, the abundance of genes within the Ribo fraction should be proportional to that of the mRNA fraction. Genes with increased translational efficiency are identified when Ribo fraction abundance is higher than that of the mRNA fraction, whereas reduced translational efficiency is inferred when the opposite is observed.

Riboprofiling of yeast hybrids

We performed riboprofiling on hybrids of two closely related species of budding yeast, Saccharomyces cerevisiae and S. paradoxus, (~5 million years diverged). In hybrids, the parental alleles at a locus share the same trans cellular environment; therefore in the absence of cis-regulatory divergence in transcription, both alleles should be expressed at equal levels. Conversely, cis-regulatory divergence will produce unequal expression of alleles (termed allele-specific expression, or ‘ASE’). Cis-regulatory divergence at the translational level is detected when ASE in the mRNA fraction does not equal that measured in the Ribo fraction, indicating independent divergence across levels. We also performed riboprofiling on the two parental strains, as differences in the expression of orthologs between parental species that cannot be explained by the allelic differences in the hybrids can be attributed to trans divergence. Therefore, by measuring differences in the magnitudes of ASE between the two riboprofiling fractions in the hybrids and the parents, we identified independent cis and trans regulatory changes in both mRNA abundance and translational efficiency.


We found that both cis and trans regulatory divergence in translational efficiency is widespread, and of comparable magnitude to divergence at the mRNA level – indicating that we miss much regulatory evolution by focusing on mRNA in isolation. Moreover, we observed an overwhelming bias towards divergence in opposing parental directions, indicating that while many orthologs had higher mRNA abundance in one parent, they often showed increased translational efficiency in the other parent. This suggests that stabilizing selection acts to maintain more similar protein levels between species than would be expected by comparing mRNA abundances alone.

Translational divergence not associated with TATA boxes

Interestingly, while we confirmed the results of previous studies indicating that both cis and trans regulatory divergence at the mRNA level are associated with the presence of TATA boxes and nucleosome free regions in promoters, no such relationship was found for translational divergence, indicating that these regulatory systems have different underlying architectures.

Evidence for polygenic selection at two levels

We also searched for evidence of polygenic selection in and between both regulatory levels by applying a recently developed modification of Orr’s sign test (Orr 1998; Fraser et al. 2010; Bullard et al. 2010). Under neutral divergence, no pattern is expected with regards to the parental direction of up or down-regulating alleles among orthologs within a functional group (e.g., a pathway or multi-gene complex). However, a significant bias towards one parental lineage is evidence of lineage-specific selection. This analysis uncovered evidence of polygenic selection at both regulatory levels in a number of functional groups. In particular, genes involved in tolerance to heavy metals were enriched for reinforcing divergence in mRNA abundance and translation favoring S. cerevisiae. Increased tolerance to these metals has been observed in S. cerevisiae (Warringer et al. 2011), suggesting that domesticated yeasts have experienced a history of polygenic adaptation across regulatory levels allowing them to grow on metals such as copper.

Finally, using data from the Ribo fraction, we also uncovered multiple instances of conserved stop-codon readthrough, a mechanism via which the ribosome ‘ignores’ the canonical stop codon and produces a C-terminally extended peptide. Only two cases of C-terminal extensions have previously been observed in yeast, though in one such case, PDE2, extension of the canonical protein plays a functional role in regulating cAMP levels (Namy et al. 2002). Our data suggests that this mechanism may occur in dozens of genes, highlighting yet another post-transcriptional mechanism leading to increased proteomic diversity.


By applying a novel approach to a long-standing question, our analysis has revealed that post-transcriptional regulation is abundant, and likely as important as transcriptional regulation. We argue that partitioning the search for the locus of selection into the binary categories of ‘coding’ vs. ‘regulatory’ overlooks the many opportunities for selection to act at multiple regulatory levels along the path from genotype to phenotype.


Artieri CG, Fraser HB. 2013. Evolution at two levels of gene expression in yeast. Genome Research (in press).
Preprint on the arXiv. 

Bullard JH, Mostovoy Y, Dudoit S, Brem RB. 2010. Polygenic and directional regulatory evolution across pathways in Saccharomyces. Proc Natl Acad Sci USA 107: 5058-5063.

Epstein DJ. 2009. Cis-regulatory mutations in human disease. Brief Funct Genomic Proteomic 8: 310–316.

Fraser HB, Moses AM, Schadt EE. 2010. Evidence for widespread adaptive evolution of gene expression in budding yeast. Proc Natl Acad Sci USA 107: 2977-2982.

Ingolia NT, Ghaemmaghami S, Newman JR, Weissman JS. 2009. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324:218-223.

King MC, Wilson AC. 1975. Evolution at two levels in humans and chimpanzees. Science 188: 107-116.

Namy O, Duchateau-Nguyen G, Rousset JP. 2002. Translational readthrough of the PDE2 stop codon modulates cAMP levels in Saccharomyces cerevisiae. Mol Microbiol 43: 641-652.

Orr HA. 1998. Testing natural selection vs. genetic drift in phenotypic evolution using quantitative trait locus data. Genetics 149: 2099-2104.

Warringer J, Zörgö E, Cubillos FA, Zia A, Gjuvsland A, Simpson JT, Forsmark A, Durbin R, Omholt SW, Louis EJ, Liti G, Moses A, Blomberg A. 2011. Trait variation in yeast is defined by population history. PLoS Genet 7 :e1002111.

How recombination and changing environments affect new mutations

Blog author: David Lawrie was a graduate student in Dmitri Petrov’s lab. He is now a postdoc at USC.

I recently sat down with Oana Carja, a graduate student with Marc Feldman, to discuss her paper published in the journal of Theoretical Population Biology entitled “Evolution with stochastic fitnesses: A role for recombination”. In it, the authors Oana Carja, Uri Liberman, and Marcus Feldman explore when a new mutation can invade an infinite, randomly mating population that experiences temporal fluctuations in selection.

The one locus case

This work builds off of previous research in the field on how the fluctuations in fitness over time (i.e., increased variance of fitness) affect the invasion dynamics of a mutation at a single locus. For a single locus, it has been shown that the geometric mean of the fitness of the allele over time determines the ability of an allele to invade a population. This effect is known as the geometric mean principle. Fluctuations in fitness increase the variance and therefore decrease the geometric mean fitness. The variance of the fitness of the allele over time thus greatly impacts the ability of that allele to invade a population.

What if there are two loci?

In investigating a two locus model, the researchers split the loci by their effect on the temporally-varying fitness: one locus only affects the mean, while the other controls the variance. The authors demonstrate through theory and simulation that:

1)    allowing for recombination between the two loci increases the threshold for the combined fitness of the two mutant alleles to invade the population beyond the geometric mean (see figure).

2)    periodic oscillations in the fitness of the alleles over time lead to higher fitness thresholds for invasion over completely random fluctuations (see figure).

3)    edge case scenarios allow for the maintenance of polymorphisms in the population despite clear selective advantages of a subset of allelic combinations.

Temporally changing environments and recombination thus make it overall more difficult for new alleles to invade a population.

Invasibility thresholds as a function of recombination rate. Recombination makes it more difficult for new alleles to invade a population.

Invasibility thresholds as a function of recombination rate. If there is no recombination (the left-most edge of the figure), the geometric mean of the pair of new alleles needs to be higher than 0.5 to allow for invasion, because the resident alleles’ geometric mean fitness is set to 0.5. However, as recombination between the two loci increases, the geometric mean needed for invasion increases rapidly. If there is free recombination (r = 0.5) then  invasion can only happen if the new alleles’ geometric mean fitness is twice the resident alleles’ geometric mean fitness (light grey area). If the environment is changing periodically, it is even harder for new alleles to invade a population (dark grey area).

The evolution of models of evolution

This work is important for addressing the evolutionary dynamics of loci controlling phenotypic variance – in this case, controlling the ability of a phenotype to maintain its fitness even if the environment is variable. Most environments undergo significant temporal shifts from the simple changing of the seasons to larger scale weather changes such as El Niño and climate change, in which species must survive and thrive. For organisms in the wild, many alleles that confer a benefit in one environment will be deleterious when the environment and selective pressures change. There may be modifier-loci which buffer the fitness of those loci in the face of changing environments. Such modifier-loci have been recently found in GWAS studies and may be important for overall phenotypic variance. Thus modeling the patterns of evolution for multiple loci in temporarily varying environments is a key component to advancing our understanding of the patterns found in nature.

Future work

Epigenetic modifiers are a hot area of research and one potential biological mechanism to control phenotypic variance. The evolution of such epigenetic regulation is a particular research interest of Oana. Future work will continue to explore the evolutionary dynamics of epigenetic regulation and focus on applying the above results to finite populations.

Paper author: Oana Carja is a graduate student with Marc Feldman


Oana Carja, Uri Liberman, Marcus W. Feldman, Evolution with stochastic fitnesses: A role for recombination, Theoretical Population Biology, Volume 86, June 2013, Pages 29-42, ISSN 0040-5809.

Missing the forest for the trees: How frequent adaptation can confound its own inference


Blog author: Fabian Staubach was a postdoc in Dmitri Petrov’s lab and is now an assistant professor in Freiburg, Germany.

This post was written by Fabian Staubach. 

The neutral theory of molecular evolution assumes that adaptation is rare and that the effect of adaptation on linked variation, the so-called hitchhiking effect, typically has only little influence on the dynamics of molecular genetic variation. Because of this assumption, it is widely assumed that in most natural populations, hitchhiking can be neglected, or at least reasonably well approximated by the introduction of effective parameters, such as an effective population size. But if molecular adaptation is in fact common, then the assumption may be violated, and we should worry whether population genetic methods and estimates of evolutionary parameters obtained from them are robust to frequent hitchhiking.

In their paper “Frequent adaptation and the McDonald-Kreitman test” (PNAS, 2013), Philipp Messer and Dmitri Petrov investigate this question for one of the key population genetic methods — the McDonald-Kreitman (MK) test. This test forms the basis of most commonly used approaches to measure the rate of adaptation from population genomic data and has been used to argue that in some organisms, such as Drosophila, the rate of adaptation is surprisingly high.

The MK test can substantially underestimate the true rate of adaptation

Messer and Petrov employ their powerful forward simulation software, SLiM (see here), to simulate the evolution of entire chromosomes under a range of parameter values relevant to humans and other organisms, and apply various forms of the MK test to the population genomic data resulting from their simulations. They then study how accurately these methods re-infer the true evolutionary parameters in the simulations. Strikingly, they find that the MK test can substantially underestimate the true rate of adaptation. For instance, they present scenarios where 40% of the amino acid changing substitutions were in fact strongly adaptive in the simulations, while other population parameters resembled those commonly inferred for human evolution, yet the standard MK estimates yield that none of these substitutions were actually adaptive. Fortunately, Messer and Petrov propose a way to avoid these problems by using a simple, asymptotic extension of the MK test.

Figure: Illustration of the asymptotic MK estimation of the rate of adaptive substitutions : The standard MK approach assumes that all polymorphisms (non-synonymous and synonymous) are neutral. This assumption is likely violated for low frequency polymorphisms, as some of these are likely to be deleterious. The assumption should hold for very high frequency polymorphisms, because they are very unlikely to be deleterious. The asymptotic MK approach uses this fact by looking at the estimated rate from different frequency classes of alleles, and extrapolating to x=1, where the rate is expected to have asymptoted.  

The bigger claim of this straightforward and easy-to-read paper is that the effects of linked selection cannot be simply swept under the rug by introducing effective parameters, such as effective population size or effective strength of selection, and then using these effective parameters in formulae derived from the diffusion approximation under the assumption of free recombination.

Quantifying known biases

Surely, this paper will ruffle some feathers. Some people will argue that these problems have been know for a while in theory. Yet despite this, the vast majority of studies that continue to appear in the literature still pay only cursory lip service, if anything, to these issues. Presumably, this is because it is not well understood analytically to what extent linkage effects affect population genetic estimates, and Messer and Petrov therefore do an important job in quantifying these biases. Hopefully this will help focus the community’s attention to spend some time figuring out how to modify commonly used approaches to place them on a more solid foundation.

Citation: Messer, P. W., & Petrov, D. A. (2013). Frequent adaptation and the McDonald-Kreitman test. Proceedings of the National Academy of Sciences of the United States of America, 110(21), 8615–20. doi:10.1073/pnas.1220835110


Paper author: Philipp Messer is a research associate in Dmitri Petrov’s lab at Stanford, where he studies the population genetics of adaptation using theoretical and computational approaches in concert with the analysis of large-scale population genomic data.