Open space within the black box
Blog author: Carlos Araya is a postdoctoral researcher in the laboratories of Michael Snyder and William Greenleaf interested in a broad range of areas, including molecular structure-function relationships, directed evolution, regulatory control, and cancer. Carlos holds a Ph.D. in Genome Sciences from the University of Washington.
We have come to appreciate the central importance of evolution in shaping the composition and dynamics of biological systems and processes. Yet, although evolution via natural selection is a simple concept, it has proven difficult to derive a quantitative theory of evolution. We sorely lack a detailed, quantitative understanding of the mechanisms and parameters that define genomic adaptation generally, not to mention how these features vary within the context of specific evolutionary pressures or combinations thereof, or how these features define evolutionary dynamics. These questions are not solely of academic interest. A contextualized, detailed knowledge of the evolutionary landscape surrounding specific biological systems may enable an array of novel solutions to issues such as antibiotic-resistance, epidemic management, cancer treatment, and genome engineering. Thus, there is much work ahead and plenty of space for breakthroughs in the field of evolutionary genomics.
In the November 2013 issue of PLoS Genetics, Daniel Kvitek and Gavin Sherlock (from our own Genetics Department) presented an elegant analysis of genomic adaptation in a constant environment1. Motivated by the hypothesis that signaling networks evolved to sense and respond to environmental fluctuations may also carry a fitness cost, Kvitek and Sherlock sought to refine the characterization of the landscape of adaptive mutations in S. cerevisiae cultures under continuous, nutrient-limited growth where specific, extant signaling networks may be of limited utility.
Messages from inside
The backdrop for the new study lies in a key experiment2 published in 2008 by the Sherlock lab, in which postdoc Katy Kao challenged classical models of adaptive evolution which postulated that adaptive clones arose serially, always deriving from the preceding, dominant adaptive lineage. In replicate experimental evolutions, cultures were seeded with equal quantities of three nearly isogenic, fluorescently-tagged (GFP, YFP, and DsRed), haploid (S288c) S. cerevisiae strains and grown under glucose-limited conditions. By tracking the abundance of fluorescently-labeled lineages (sub-populations), Kao and Sherlock were able to record coarse dynamics of population structure during adaptive evolution2. These dynamics revealed clear signals of clonal interference, where competing lineages expanded and shrunk, indicating that adaptive mutations were common enough that lineages with distinct driver mutations spread and compete simultaneously, as had been previously observed in viruses3 and bacteria4. Clonal interference, which has important consequences in evolution as it alters the rate of adaptation and the fate of adaptive lineages, was thus demonstrated in eukaryotes. It should be noted that clonal interference relies on large population sizes (>106), a regime that is relevant to many diseases. For example, a 1 cm3 tumor has ~109 cells. At diagnosis, the malignant cell population in leukemia exceeds 1012. Bacterial infections can have similar numbers of cells, and HIV infections can contain two-orders of magnitude higher viral counts.
Array-based genotyping of FACS-sorted clones (N=5) revealed that adaptive mutations were concentrated on the glucose sensing and RAS pathways2, and a follow-up study5 revealed strong negative epistasis among two of the most common targets of adaptive mutations –MTH1 and HXT6/7– as variants combining mutations in the two displayed lower fitness than wild-type. Both gain-of-function mutations –most commonly amplifications– of the high-affinity glucose transporters HXT6/7 and loss-of-function mutations in MTH1 –a negative regulator of the transporters– provided a selective advantage by increasing the amount of glucose that is able to enter the cell and were recurrent in the evolving population, but never arose in the same background. A deep rift in the fitness landscape, created by intermolecular epistasis –non-additive fitness interactions between genes– separated lineages within a preferred pathway of adaptation.
These results suggested rich mutation diversity and population dynamics would underlie the observed levels of clonal interference, which the authors hypothesized could be probed with population-level, whole-genome sequencing through the course of experimental evolution. However, given the complexity of evolved yeast populations, with population sizes of ~109 cells, population-level sequencing introduces a number of challenges for sensitive and accurate determination of mutation alleles and frequencies, respectively. For example, even at a high sequencing depth of 1,000x, a per-base sequencing error of 1% as is standard in modern sequencing technologies would generate 10 reads with non-reference alleles by chance. Furthermore, only small (~1 ml) population samples had been stored in the freezer and thus sensitive (low input) library preparation methods would be required. Lastly, the ability to detect adaptive mutations from short (≤100 nt) population sequences is, by and large, restricted to single nucleotide polymorphisms (SNPs) and short sequence insertion/deletions (indels), whose allele frequencies rise above sequencing error levels.
Uncovering and monitoring adaptive SNPs in evolving populations
Towards these goals, the authors applied recently developed experimental techniques to sequence limited amounts of genomic DNA at decreased error rates, from the previously studied2, triplicate (haploid) S. cerevisiae cultures. For each experimental evolution (E1-3), the authors performed population-level sequencing at 8 timepoints, separated by ~70 generations, spanning ~450 generations of continuous growth. In total, 24 timepoints were sequenced. Concomitant fragmentation and adapter-tagging –so called enzymatic ‘tagmentation’– with the Nextera Tn5 transposase to reduce the number of requisite DNA clean-ups limited nucleic acid loss during library preparation, allowing robust library preparation from limited input samples.
Population-sequencing reads were mapped to a custom reference genome, corresponding to the ancestral GSY1136 strain. A fixed-barcode, wild-type library spiked into each population library permitted lane-specific, base-quality calibration and non-reference allele quantification. To determine SNPs from population-level sequences, allele counts at each position were compared to reference allele counts from wild-type libraries, and SNPs were called at sites with significant (multiple hypothesis- and FDR-corrected) enrichments in non-reference alleles. Positions with q-value < 0.01 were retained and SNPs were heuristically filtered for further quality refinement. Barcoded adapters allowed PCR-duplicate identification and removal, and paired read-overlap correction methods –as pioneered in deep mutational scanning6,7– allowed sufficient error rate reduction to support variant identification for mutant alleles present in as low as 1% of the population. With the exception of one, all SNPs with a maximum allele frequency ≥10% were validated.
What lies within: clonal interference and mutation cohorts
Applying these methods uncovered 117 mutations in 51 genes, of which 37% (19) were recurrently mutated, suggesting parallel evolution in distinct populations and lineages. Mutations fixed in only one of the three populations (E3). Assuming a DNA mutation rate of 10-10 per base, per generation, these mutations represent ~0.002% of all de novo mutations arising across populations.
The observed mutation dynamics reveal a striking prevalence of clonal interference, whereby beneficial mutations that rose in frequency often succumbed to alternate expanding lineages prior to fixing in the population (Fig. 1). This phenomenon strongly decouples the maximum and the final alleles frequencies in evolving populations, and introduces complex lineage dynamics. In effect, 63% of the identified mutations decreased in frequency from their maxima, and 36% of identified mutations –which necessarily rose to ≥1% frequencies for detection– become extinct. Thus, even within a continuous environment the fate of competing mutations cannot be predicted as diverse adaptive solutions continue to arise within heterogeneous genetic backgrounds. These results are consistent with previous observations of clonal interference in diverse evolving populations8-10 and point to the continued and therefore combinatorial introduction of adaptive –as well as neutral and mal-adaptive– mutations. Not skipping a beat, the authors rolled-up their sleeves and set out to address combinatorial mutations.
Figure 1. Visualizing the dynamics and linkage of mutations with frequencies above 10% in the E1 (A), E2 (B), and E3 (C) evolutions.
Genotyping individual adaptive clones by Sanger-sequencing (N≈102 clones), Kvitek and Sherlock uncovered extensive linkage among mutations above 10% frequency. Over 90% of these mutations occur in cis with other mutations. To untangle beneficial and neutral mutations, the authors deemed recurrent independent mutations as beneficial –a reasonable assumption given the that most mutations are deleterious7– and revealed that all defined lineages harbor at least one such beneficial mutation. In each evolution, the final, highest-frequency lineage harbored at least three beneficial mutations, indicating that lineage success frequently requires multiple beneficial mutations and is thus non-deterministic.
Naturally, the extent of clonal interference and mutation linkage are defined by population size, the rate of mutation, and the fraction of adaptive, neutral, and mal-adaptive mutations. These three latter parameters can vary strongly among selective environments, and even during lineage progression as demonstrated by decelerating fitness gains during adaptation11. These data invite quantitative modeling to derive the true underlying adaptive mutation rate to support the observed adaptive mutation (≥10%) frequency, taking advantage of the known mutation rate, population size, and generation times. However, such analysis would necessitate modeling the underlying fitness distribution of mutations.
The prevalence of genetic hitchhiking, whereby lineages with multiple mutations rise and fall during evolution, implies that epistasis can play a major role in shaping population dynamics and evolutionary trajectories. Consistently, recent reports12-15 have suggested wide-spread epistasis in genome and protein evolution. Whereas a large fraction of protein intramolecular epistasis is accounted for by a robustness/protein stability axis16, fundamental questions regarding the source of intermolecular epistasis remain. We now know that the decelerating fitness gains observed during adaptation can arise from diminishing-returns epistasis11, whereby the sequential combination of beneficial mutations leads to gradually lower fitness gains within lineages as pathway optimization proceeds. Conversely, sudden and dramatic changes can occur in evolving populations when innovative beneficial mutations arise that can exploit new ecological niches. These innovations can in turn show all-or-none epistasis17 when an evolved genetic background is a prerequisite for the adaptive phenotype. However, it is presently unclear whether intermolecular epistasis signatures are primarily due to exchange costs between adaptations to diverse and fluctuating environments, or whether epistasis occurs primarily among adaptive mutations to the same selective pressures.
Uncovering adaptive strategies
Monitoring the rise, fall and linkage of mutations revealed critical population dynamics at play in adaptation, but speaks not to the specific cellular strategies conferred by adaptive mutations. Here, pathway analysis revealed that 53% of mutations in recurrently mutated genes lay within three major signaling pathways: glucose signaling and transport, cAMP/PKA, and the high-osmolarity glycerol (HOG) pathway. In addition, recurrent mutations were observed in sterol metabolism and cell-cycle genes (ACE2, WHI2). Tellingly, nonsense mutations –which truncate proteins– were 7.6x higher in frequency than expected by chance, which provides evidence that disruption of signaling networks is a common and effective adaptive strategy to increase fitness in non-fluctuating environments.
These results demonstrate that signaling networks impose a fitness cost on cellular systems, and beg the question of what are the specific efficiencies gained, or rather, what are the inefficiencies removed, by disrupting signaling systems? Do the (competitive) fitness defects arise from opportunity costs of delayed response, from metabolic costs, or a combination thereof? Are phosphorylation cascades, with their copious requirements for ATP, particularly more sensitive to degradation under continuous evolution? Are upstream or downstream signaling components more susceptible to disruption and how feedback systems affect this balance? These results suggest that collaborations between the fields of experimental evolution and systems modeling may help (1) pinpoint the specific energy gains afforded by disrupting specific signaling components and systems (i.e., the MAPK pathway), (2) determine which signaling architectures are more robust or energy efficient, and (3) refine metabolic network circuitry. In addition, imbalances in metabolic flux may underlie a substantial portion of intermolecular epistatic interactions between adaptive mutations. Thus, coupling recent developments in cellular modeling18 with laboratory evolution may prove conducive to analyses at the intersection of evolution, systems biology, and epistasis.
Significance & Future Directions
The findings summarized here speak to the power of experimental evolution. Dissection of a single experimental evolution has uncovered clonal interference in a eukaryotic system, reciprocal sign epistasis, the widespread success of lineages with multiple beneficial mutations, and the fitness cost of signaling in a continuous environment. Striking features, such as clonal interference and the success of lineages with multiple beneficial mutations, speak to the frequency of adaptive mutations, and have now been observed in several systems. In an excellent study10 applying population-level sequencing to forty evolving cultures of S. cerevisiae grown in rich medium, these features were observed as nested sweeps –whereby one mutational sweep initiates before a previous has completed– and mutation cohorts, temporal clusters of mutations on shared genetic backgrounds. These features established, future work into the quantification of clonal interference, adaptive mutation rates, the prevalence of epistasis, and metabolic costs will allow refined mathematical models for predictive and diagnostic analysis of evolutionary processes.
Yet, significant technical challenges remain. For example, we do not at present know how much of the variants in fitness can be accounted by SNPs. Owing to the difficulty of assessing copy-number variants (CNVs) from short sequences, this important class of adaptive mutations has remained unstudied in population-level studies performed to-date. Importantly, previous experiments19 have shown that amplification of high-affinity glucose transporters (HXT6/7) are frequent genomic adaptations to glucose limitation and careful dissection of adaptive amplifications20 has hinted at novel mechanisms of DNA recombination21. Such limitations, however, may be eventually overcome by (long-awaited) long-read sequencing technologies and methods for comprehensive mutation tracking, serving to both enhance structural variant detection and enable haplotype-resolved, variant tracking. These developments would allow in-depth, genome-wide inquiry into the prevalence and role of epistasis in modulating accessibility to and reproducibility of adaptive mutations.
Similarly, increased sequencing depths coupled with reduced sequencing error rates will allow examination of increasingly lower-representation genotypes, allowing ever high-resolution analysis of population dynamics. For example, recently developed methods22 that combine rolling-circle amplification and population sequencing to achieve error rates of ~10-6 per base, have now been applied to study poliovirus evolution at unprecedented scale23. The drastically reduced error rates permitted Acevedo et al.23 to detect mutations at frequencies two-orders of magnitude below that of the reported mutation frequencies in poliovirus populations (~2 x 10-4 per base). Combined with the increased coverage (~200,000x) afforded by the small (~7.5 kb) genome, the approach revealed a staggering diversity of mutations, most of which are present at very low frequencies in the population (10-3-10-5)23.
The frequent observation of adaptive, loss-of-function mutations in multiple distinct pathways suggests that the specific selective pressure studied is permissive to large numbers of adaptive mutations. As gain-of-function mutations are infrequent relative to loss-of-function mutations, we can expect the adaptive mutation rate and dynamics of evolution in other environments to differ substantially. Future applications of whole-genome, whole-population sequencing approaches with increased read-length and fidelity will provide a fruitful avenue to reveal ever more intricate mechanisms of adaptive response.
Finally, it is likely that methods from the –currently human-biased– genome interpretation field may provide richer analyses into the functional roles of adaptive mutations. Phenotype ontologies, mutation prioritization tools, and improved methods for assessing the impact of coding (beyond first-generation programs such as SIFT and PolyPhen) and regulatory sequence mutations are as relevant to experimental evolution as to human diagnostics. In turn, we can expect findings from experiment evolution to help establish a framework for understanding the dynamics of aberrant cancer genomes, antibiotic resistance, and immune-evasion. Learning to detect signatures of selection and distinguish modes of population dynamics within these genomes may prove paramount to treatment.
Paper author: Gavin Sherlock is associate professor in the Genetics Department.
Paper author: Dan Kvitek completed his Ph.D. in 2013 in the laboratory of Gavin Sherlock. Dan now combines experimental and computational research to diagnose genetic variants at Invitae.
1. Kvitek, D. J. & Sherlock, G. Whole genome, whole population sequencing reveals that loss of signaling networks is the major adaptive strategy in a constant environment. PLoS Genet 9, e1003972 (2013).
2. Kao, K. C. & Sherlock, G. Molecular characterization of clonal interference during adaptive evolution in asexual populations of Saccharomyces cerevisiae. Nat Genet 40, 1499–1504 (2008).
3. Miralles, R., Gerrish, P. J., Moya, A. & Elena, S. F. Clonal interference and the evolution of RNA viruses. Science 285, 1745–1747 (1999).
4. Perfeito, L., Fernandes, L., Mota, C. & Gordo, I. Adaptive mutations in bacteria: high rate and small effects. Science 317, 813–815 (2007).
5. Kvitek, D. J. & Sherlock, G. Reciprocal sign epistasis between frequently experimentally evolved adaptive mutations causes a rugged fitness landscape. PLoS Genet 7, e1002056 (2011).
6. Fowler, D. M. et al. High-resolution mapping of protein sequence-function relationships. Nat Methods 7, 741–746 (2010).
7. Araya, C. L. et al. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proceedings of the National Academy of Sciences (2012). doi:10.1073/pnas.1209751109
8. de Visser, J. A. G. M. & Rozen, D. E. Clonal interference and the periodic selection of new beneficial mutations in Escherichia coli. Genetics 172, 2093–2100 (2006).
9. Toprak, E. et al. Evolutionary paths to antibiotic resistance under dynamically sustained drug selection. Nat Genet 44, 101–105 (2011).
10. Lang, G. I. et al. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature 500, 571–574 (2013).
11. Chou, H.-H., Chiu, H.-C., Delaney, N. F., Segrè, D. & Marx, C. J. Diminishing returns epistasis among beneficial mutations decelerates adaptation. Science 332, 1190–1192 (2011).
12. Corbett-Detig, R. B., Zhou, J., Clark, A. G., Hartl, D. L. & Ayroles, J. F. Genetic incompatibilities are widespread within species. Nature 504, 135–137 (2013).
13. Breen, M. S., Kemena, C., Vlasov, P. K., Notredame, C. & Kondrashov, F. A. Epistasis as the primary factor in molecular evolution. Nature (2012). doi:10.1038/nature11510
14. Natarajan, C. et al. Epistasis among adaptive mutations in deer mouse hemoglobin. Science 340, 1324–1327 (2013).
15. Weinreich, D. M., Delaney, N. F., Depristo, M. A. & Hartl, D. L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006).
16. Bershtein, S., Segal, M., Bekerman, R., Tokuriki, N. & Tawfik, D. S. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444, 929–932 (2006).
17. Meyer, J. R. et al. Repeatability and Contingency in the Evolution of a Key Innovation in Phage Lambda. Science 335, 428–432 (2012).
18. Karr, J. R. et al. A whole-cell computational model predicts phenotype from genotype. Cell 150, 389–401 (2012).
19. Gresham, D. et al. The repertoire and dynamics of evolutionary adaptations to controlled nutrient-limited environments in yeast. PLoS Genet 4, e1000303 (2008).
20. Araya, C. L., Payen, C., Dunham, M. J. & Fields, S. Whole-genome sequencing of a laboratory-evolved yeast strain. BMC Genomics 11, 88 (2010).
21. Brewer, B. J., Payen, C., Raghuraman, M. K. & Dunham, M. J. Origin-dependent inverted-repeat amplification: a replication-based model for generating palindromic amplicons. PLoS Genet 7, e1002016 (2011).
22. Lou, D. I. et al. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proceedings of the National Academy of Sciences 110, 19872–19877 (2013).
23. Acevedo, A., Brodsky, L. & Andino, R. Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature 505, 686–690 (2014).