A framework for identifying and quantifying fitness effects across loci

Blog author Ethan Jewett is a PhD student in the lab of Noah Rosenberg.

Blog author Ethan Jewett is a PhD student in the lab of Noah Rosenberg.

The degree to which similarities and differences among species are the result of natural selection, rather than genetic drift, is a major question in population genetics. Related questions include: what fraction of sites in the genome of a species are affected by selection? What is the distribution of the strength of selection across genomic sites, and how have selective pressures changed over time? To address these questions, we must be able to accurately identify sites in a genome that are under selection and quantify the selective pressures that act on them.

Difficulties with existing approaches for quantifying fitness effects    

A recent paper in Trends in Genetics by David Lawrie and Dmitri Petrov (Lawrie and Petrov, 2014) provides intuition about the power of existing methods for identifying genomic regions affected by purifying selection and for quantifying the selective pressures at different sites. The paper proposes a new framework for quantifying the distribution of fitness effects across a genome. This new framework is a synthesis of two existing forms of analysis – comparative genomic analyses to identify genomic regions in which the level of divergence among two or more species is smaller than expected, and analyses of the distribution of the frequencies of polymorphisms (the site frequency spectrum, or SFS) within a single species (Figure 1). Using simulations and heuristic arguments, Lawrie and Petrov demonstrate that these two forms of analysis can be combined into a framework for quantifying selective pressures that has greater power to identify selected regions and to quantify selective strengths than either approach has on its own.

Figure 1. Using the quantify the strength of purging selection. The SFS tabulates the number of polymorphisms at a given frequency in a sample of haplotypes. Under neutrality (black dots) many high-frequency polymorphisms are observed. Under purifying selection (higher values of the effective selection strength |4Nes|), a higher fraction of new mutations are deleterious, leading to fewer high-frequency polymorphisms (red and blue dots). Adapted from Lawrie and Petrov (2014).

Figure 1. Using the site frequency spectrum (SFS) to quantify the strength of purifying selection. The SFS tabulates the number of polymorphisms at a given frequency in a sample of haplotypes. Under neutrality (black dots) many high-frequency polymorphisms are observed. Under purifying selection (higher values of the effective selection strength |4Nes|), a higher fraction of new mutations are deleterious, leading to fewer high-frequency polymorphisms (red and blue dots). Adapted from Lawrie and Petrov (2014).

Lawrie and Petrov begin by discussing the strengths and weaknesses of the two existing approaches. Comparative analyses of genomic divergence are beneficial for identifying genomic regions under purifying selection, which will exhibit lower-than-expected levels of divergence among species. However, as Lawrie and Petrov note, it can be difficult to use comparative analyses to quantify the strength of selection in a region because even mild purifying selection can result in complete conservation among species within the region (Figure 2). For example, whether the population-scaled selective strength, 4Nes, in a region is 20 or 200, the same genomic signal will be observed, complete conservation.

Figure 1. Adapted from Lawrie and Petrov (2013). The evolution of several 100kb regions was simulated in 32 different mammalian species under varying strengths of selection |4Nes|. The number of substitutions in each region was then estimated using genomic evolutionary rate profiling (GERP). The plot shows the median across regions of the number of inferred substitutions. From the plot, it can be seen that, once the strength of selection exceeds a weak threshold value (3 for the example given), there is full conservation among species.

Figure 1. Adapted from Lawrie and Petrov (2013). The evolution of several 100kb regions was simulated in 32 different mammalian species under varying strengths of selection |4Nes|. The number of substitutions in each region was then estimated using genomic evolutionary rate profiling (GERP). The plot shows the median across regions of the number of inferred substitutions. From the plot, it can be seen that, once the strength of selection exceeds a weak threshold value (3 for the example given), there is full conservation among species.

In contrast to comparative approaches, analyses of within-species polymorphisms based on the site frequency spectrum (SFS) within a region can be used to more precisely quantify the strength of selection. For example, Figure 1 shows that different selective strengths can produce very different site frequency spectra. Moreover, if the SFS can be estimated precisely enough, it can allow us to distinguish between two different selective strengths (e.g., 4Nes1 = 20 and 4Nes2 = 200) that would both lead to total conservation in a comparative study, and would therefore be indistinguishable. The problem is that it takes a lot of polymorphisms to obtain an accurate estimate of the SFS, and a genomic region of interest may contain too few polymorphisms, especially if the region is under purifying selection, which decreases the apparent mutation rate. Sampling additional individuals from the same species may provide little additional information about the SFS because few novel polymorphisms may be observed in the additional sample. For example, recall that for a sample of n individuals from a wildly idealized panmictic species, the expected number of novel polymorphisms observed in the n+1st sampled individual is proportional to 1/n (Watterson1975).

A proposed paradigm

Lawrie and Petrov demonstrate that studying polymorphisms by sampling many individuals across several related species (rather than sampling more individuals within a single species) could increase the observed number of polymorphisms in a region, and therefore, could increase the power to quantify the strength of selection (Figure 3) – as long as the selective forces in the genomic region are sufficiently similar across the different species.

Figure3

Figure 3. The benefits of studying polymorphisms in many populations, rather than within a single population. Three populations (A, B, and C) diverge from an ancestral population, D. The genealogy of a single region is shown (slanted lines) with mutations in the region denoted by orange slashes. Additional lineages sampled in population A are likely to coalesce recently with other lineages (for example, the red clade in population A ) and, therefore, carry few mutations that have not already been observed in the sample. In comparison, the same number of lineages sampled from a second population are likely to carry additional independent polymorphisms (for example, the red lineages in population B). If the selective pressures at the locus in populations A and B are similar, then the SFS in the two populations should be similar, and the additional lineages in B can provide additional information about the SFS. For example, if the demographic histories and selective pressures at the locus are identical in populations A and B, and if the samples from populations A and B are sufficiently diverged, then a sample of K lineages from each population, A and B, will contain double the number of independent polymorphisms that are observed in a sample of K lineages from population A alone, providing double the number of mutations that can be used to estimate the SFS.

The need for sampling depth and breadth

Without getting bogged down in the details, it’s the rare variants that are often the most important for quantifying the effects of purifying selection, so one still has to sample deeply within each species; however, overall, sampling from additional species is a more efficient way of increasing the absolute number of variants that can be used to estimate the SFS in a region, compared with sampling more deeply within the same species.

The simulations and heuristic arguments presented by Lawrie and Petrov consider idealized cases for simplicity; however, the usefulness of approaches that consider polymorphisms across multiple species has been demonstrated in methods such as the McDonald-Kreitman test (McDonald and Kreitman, 1991), which have long been important tools for studying selection. More recent empirical applications of approaches that consider information about polymorphisms across multiple species appear to do a good job of quantifying selective pressures across genomes (Wilson et al., 2011; Gronau et al., 2013), even when species are closely related (De Maio et al., 2013). Overall, the simulations and arguments presented in Lawrie and Petrov’s paper provide useful guidelines for researchers interested in identifying and quantifying selective forces, and their recommendation to sample deeply within species and broadly across many species comes at a time when such analyses are becoming increasingly practical, given the recent availability of sequencing data from many species.

References:

  1. De Maio, N., Schlötterer, C., and Kosiol, C. (2013). Linking great apes genome evolution across time scales using polymorphism-aware phylogenetic models. Molecular biology and evolution30:2249-2262.
  2. Gronau, I., Arbiza, L., Mohammed, J., and Siepel, A. (2013). Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. Molecular biology and evolution30:1159-1171.
  3. Lawrie, D.S. and Petrov, D.A. (2014). Comparative population genomics: power and principles for the inference of functionality. Trends in Genetics30:133-139.
  4. Watterson, G.A. (1975). On the number of segregating sites in genetical models without recombination. Theoretical population biology7:256-276.
  5. Wilson, D.J., Hernandez, R.D., Andolfatto, P., and Przeworski, M. (2011). A population genetics-phylogenetics approach to inferring natural selection in coding sequences. PLoS genetics7:e1002395.

Paper author: David Lawrie was a graduate student in Dmitri Petrov’s lab. He is now a postdoc at USC.

 

 

 

How recombination and changing environments affect new mutations

Blog author: David Lawrie was a graduate student in Dmitri Petrov’s lab. He is now a postdoc at USC.

I recently sat down with Oana Carja, a graduate student with Marc Feldman, to discuss her paper published in the journal of Theoretical Population Biology entitled “Evolution with stochastic fitnesses: A role for recombination”. In it, the authors Oana Carja, Uri Liberman, and Marcus Feldman explore when a new mutation can invade an infinite, randomly mating population that experiences temporal fluctuations in selection.

The one locus case

This work builds off of previous research in the field on how the fluctuations in fitness over time (i.e., increased variance of fitness) affect the invasion dynamics of a mutation at a single locus. For a single locus, it has been shown that the geometric mean of the fitness of the allele over time determines the ability of an allele to invade a population. This effect is known as the geometric mean principle. Fluctuations in fitness increase the variance and therefore decrease the geometric mean fitness. The variance of the fitness of the allele over time thus greatly impacts the ability of that allele to invade a population.

What if there are two loci?

In investigating a two locus model, the researchers split the loci by their effect on the temporally-varying fitness: one locus only affects the mean, while the other controls the variance. The authors demonstrate through theory and simulation that:

1)    allowing for recombination between the two loci increases the threshold for the combined fitness of the two mutant alleles to invade the population beyond the geometric mean (see figure).

2)    periodic oscillations in the fitness of the alleles over time lead to higher fitness thresholds for invasion over completely random fluctuations (see figure).

3)    edge case scenarios allow for the maintenance of polymorphisms in the population despite clear selective advantages of a subset of allelic combinations.

Temporally changing environments and recombination thus make it overall more difficult for new alleles to invade a population.

Invasibility thresholds as a function of recombination rate. Recombination makes it more difficult for new alleles to invade a population.

Invasibility thresholds as a function of recombination rate. If there is no recombination (the left-most edge of the figure), the geometric mean of the pair of new alleles needs to be higher than 0.5 to allow for invasion, because the resident alleles’ geometric mean fitness is set to 0.5. However, as recombination between the two loci increases, the geometric mean needed for invasion increases rapidly. If there is free recombination (r = 0.5) then  invasion can only happen if the new alleles’ geometric mean fitness is twice the resident alleles’ geometric mean fitness (light grey area). If the environment is changing periodically, it is even harder for new alleles to invade a population (dark grey area).

The evolution of models of evolution

This work is important for addressing the evolutionary dynamics of loci controlling phenotypic variance – in this case, controlling the ability of a phenotype to maintain its fitness even if the environment is variable. Most environments undergo significant temporal shifts from the simple changing of the seasons to larger scale weather changes such as El Niño and climate change, in which species must survive and thrive. For organisms in the wild, many alleles that confer a benefit in one environment will be deleterious when the environment and selective pressures change. There may be modifier-loci which buffer the fitness of those loci in the face of changing environments. Such modifier-loci have been recently found in GWAS studies and may be important for overall phenotypic variance. Thus modeling the patterns of evolution for multiple loci in temporarily varying environments is a key component to advancing our understanding of the patterns found in nature.

Future work

Epigenetic modifiers are a hot area of research and one potential biological mechanism to control phenotypic variance. The evolution of such epigenetic regulation is a particular research interest of Oana. Future work will continue to explore the evolutionary dynamics of epigenetic regulation and focus on applying the above results to finite populations.

Paper author: Oana Carja is a graduate student with Marc Feldman

Reference

Oana Carja, Uri Liberman, Marcus W. Feldman, Evolution with stochastic fitnesses: A role for recombination, Theoretical Population Biology, Volume 86, June 2013, Pages 29-42, ISSN 0040-5809.