Catching the drift of experimental evolution.

Blog author Doc Edge is a graduate student in Noah Rosenberg's lab.

Blog author Doc Edge is a graduate student in the Rosenberg lab.

One picture of science, popular in dramatic depictions of scientific history, shows an isolated theorist working out the implications of a bold idea. In this picture, the theorist relies, somewhat indifferently, on the ingenuity of an unknown, unspecified experimentalist who will someday—perhaps many years later—test the theorist’s ideas against unforgiving reality. It’s trite to point out that this picture isn’t a good representation of actual scientific practice. The relationship between theory and experiment both is and should be more bidirectional, more collaborative, and more nuanced than this, as philosophers of science, sociologists of science, and scientists themselves have pointed out for decades. A recent paper by CEHG graduate student Arbel Harpak (Pritchard lab) and Guy Sella (Stanford PhD 2001 from Marc Feldman’s group, in the less-sunny, pre-CEHG days) is a lesson in how experimentally-motivated theoretical work can inform both future theory and future experiment.

(Neutral) Evolution in a Test Tube

Harpak and Sella take up one of the most interesting contexts for the study of evolution: the serial-transfer experiment. In serial-transfer experiments, microorganisms (usually E. coli or yeast) are allowed to divide in an isolated container with some nutrients. Once the population in a container reaches a pre-specified size (which Harpak and Sella call N2), a sample of a pre-specified size (N1) is taken from the original container and transferred to a new one. This process of growth, sampling, and transfer is then repeated many times.

A schematic of a serial transfer experiment (from Sprouffske et al., 2012) [Note: in a previous version this image was mistakenly attributed to A. Harpak.]

Both experimenters and theorists have focused on understanding selection in serial-transfer experiments. This is sensible: the serial-transfer scenario allows researchers to manipulate the parameters that influence adaptation—population size, selective pressure, etc.—and to study the dynamics of adaptation. Harpak and Sella start from a premise that has been less widely-appreciated in the serial-transfer context: namely, that the other forces of evolution, including drift and demography, are also active in serial-transfer contexts. These selectively-neutral forces are ever-present, running in the background of experiments designed to study selection. It is not that Harpak and Sella claim that neutral forces are as or more important than selective forces in a typical serial-founder experiment—they acknowledge that selective forces likely predominate. But they rightly point out that we cannot fully understand what selection does without understanding the context in which it acts. Population geneticists have known this for a long time, but until now, they have not formally applied this insight to serial-transfer experiments.

Harpak and Sella develop a fully-articulated model for neutral evolution in a serial-transfer context. In population-genetic terms, the outcome of a serial-transfer experiment is the product of repeated periods of growth followed by bottlenecks. They build on the intuitions that population geneticists already have about population growth and bottlenecks to sketch a picture of what serial transfers will do to measurements of genetic diversity. Diversity slowly builds up as the population grows and as more distinct lineages appear in the population. But when a small subset of the population is removed and transferred to a new container, only the diversity that is represented in that small subset will be present in the next phase of the population’s growth. When the experiment involves many cycles of serial transfer, the genetic diversity ends up looking like that of a population with a constant size of N1*log2(N2/N1), where N1 is the number of organisms transferred during each cycle and N2 is the size to which the population grows just before the end of the cycle. This is a neat result—as has been found in other contexts, the genetic diversity is more strongly influenced by extremity of the bottleneck than it is by the maximum size to which the population grows.

Relaxing the Assumptions

This result agrees with population geneticists’ intuitions about demographic changes and genetic diversity. Nonetheless, there are two reasons why it is unrealistic for actual serial-transfer experiments. First, most serial-transfer experiments are not long enough to reach the predicted equilibrium state. Second, the equilibrium prediction assumes that all the microorganisms in the sample divide at the same time, but in reality, there is variation in the length of time that a cell persists before dividing. Harpak and Sella adjust their model to deal with both of these issues. In considering the shorter time scales typical of serial-transfer experiments, they find that the shorter the experiment is, the more the genetic diversity will be dominated by rare variation—genetic differences seen only in a single individual, for example. They also derive results for another version of their model in which the length of time to the next cell division is random. When cells vary in how long they take to divide, the average number of differences between any two cells decreases. This is because cells that divide quickly are likely to leave more offspring in subsequent generations. This means that any two cells in a subsequent generation are more likely to be recently related, since there is an increased chance that they both descend from some quickly-dividing recent ancestor.

Harpak and Sella’s paper is a step toward a fully-realized population-genetic framework for serial transfer experiments and for experimental evolution more generally. We have long known that models of evolution have to account for selectively-neutral forces in addition to selection, and it is only sensible to think of evolutionary experiments in the same way. By responding to a frequently-used experimental setup, Harpak and Sella have done a service for experimental evolution, providing a better understanding of the selectively neutral processes that act in serial transfer experiments. At the same time, Harpak and Sella perform a service that is perhaps even more important for population-genetic theory. Evolutionary experiments are an unprecedented opportunity to learn about the actions of evolutionary forces at a large scale. Moreover, experiments offer researchers the power to control the setting in which evolution takes place and to repeat evolutionary processes. Population-genetic theorists, who until recently have had to make do with the single iteration of evolution that nature has provided, ought to jump at the chance to test their ideas in real-time. Harpak and Sella’s paper is an example of how theorists attending closely to experimental methods can lay the ground for improved experiments and improved theories.


Harpak A, Sella G (2014) Neutral null models for diversity in serial transfer evolution experiments. Evolution. DOI: 10.1111/evo.12454

Sprouffske K, Merlo LM., Gerrish PJ, Maley CC, & Sniegowski PD (2012). Cancer in light of experimental evolution. Current Biology22(17), R762-R771, DOI: 10.1016/j.cub.2012.06.065.

Paper author Arbel Harpak is a graduate student in the Pritchard lab.

Paper author Arbel Harpak is a graduate student in the Pritchard lab.


Which genetic variants determine histone marks?


Blog author Joe Davis is a graduate student with Stephen Montgomery & Carlos Bustamante.

The wealth of genetic variation in the human genome is found not within protein-coding genes but within non-protein coding regions. This comes as no surprise given that only 1% percent of the genome codes for proteins. Until recently, efforts to determine the effects of genetic variation on trait variation and disease have focused on coding regions. Results of genome-wide association studies (GWAS), however, have shown that trait and disease associated variants are often regulatory variants such as expression quantitative trait loci (eQTLs) found in non-coding regions. These results have spurred an effort to understand the functional role of non-coding, regulatory variation. Efforts have thus far relied on characterizing the association between variants and gene expression. This association alone, however, will not reveal the complete functional mechanism by which non-coding variants influence gene expression. Recent efforts have therefore begun to characterize numerous molecular phenotypes such as transcription factor (TF) binding, histone modification, and chromatin state to determine the mechanisms by which regulatory variants affect gene expression.

One issue, four papers

In the November 8 issue of Science, three papers were published that address the role of non-coding genetic variation on TF binding, histone modifications, and chromatin state (i.e. active versus inactive enhancer status). The first study was completed by the Dermitzakis Lab at the University of Geneva. They analyzed three TFs, RNA polymerase II (Pol II), and five histone modifications using chromatin immunoprecipitation and sequencing (ChIP-Seq) in lymphoblastoid cell lines (LCLs) from two parent-child trios [1]. The second was completed by the Pritchard Lab, which has recently moved to Stanford, and the Gilad Lab at the University of Chicago. They identified genetic variants affecting variation in four histone modifications and Pol II occupancy in ten unrelated Yoruba LCLs [2]. The third study was performed by the Snyder Lab at Stanford. They characterized the genetic variation underlying changes in chromatin state using RNA-Seq and ChIP-Seq for four histone modifications and two DNA binding factors in 19 LCLs from diverse populations [3]. This work was the subject of a recent CEHG Evolgenome talk given by Maya Kasowski, the study’s first author. Finally, the fourth study, published in the November 28 issue of Nature, was performed by the Glass Lab at UCSD. They characterized the effect of natural genetic variation between two mouse strains on the binding of two TFs involved in cell differentiation (PU.1 and C/EBPα) using ChIP-Seq [4]. In this post, I will analyze primarily the work presented by the Pritchard Lab, but I strongly recommend reading all four papers to understand the challenges in characterizing non-coding variation and the methods available to do so.


The four studies seek to answer the general question of how regulatory variation affects gene expression. They characterize diverse molecular phenotypes such as histone modifications and TF binding to understand the mechanisms of action for non-coding variants. The Pritchard Lab focused their study on four histone modifications (three active and one repressive: H3K4me3, H3K4me1, H3K27ac, and H3K27me3, respectively) and Pol II occupancy.


Histone modifications 101

Histone modifications refer to the addition of chemical groups such as methyl or acetyl to specific amino acids on the tails of histone proteins comprising the nucleosome. These chemical groups are referred to as histone marks. They can serve a wide range of functions, but in general they are associated with the accessibility of a chromatin region. For example, the tri-methylation of lysine 4 of histone 3 (H3K4me3) is associated with increased chromatin accessibility and gene activation. On the other hand, increased levels of the repressive mark H3K27me3 (tri-methylation of lysine 27 of histone 3) at promoters is associated with gene inactivation.

Histone mark levels are measured in a high-throughput manner using ChIP-Seq. Briefly, an antibody targeting the mark of interest is used to pull down modified genomic regions. These immunoprecipitated regions are then sequenced to determine which genomic segments are modified and at what level. The procedure usually requires a large number of cells (on the order of 10^7). Therefore, the modification level is, in some ways, a population level measurement. Analysis of ChIP-Seq data typically involves testing for genomic regions with more reads than expected by chance. These regions, ranging from 200bp to 1000bp or more, are referred to as peaks that represent a modification level above the genomic background. Repressive marks like H3K27me3 tend to have broad peak regions, while activating marks like H3K4me3 can have much tighter peaks.

Since modification levels represent measurements on a population of cells and histone residues can have multiple modifications, genomic regions can show evidence for multiple marks. The combinations of these marks over a region can mark the function of the region. For example, regions with high levels of H3K27ac and a high ratio of H3K4me1 to H3K4me3 can mark active enhancer regions. Until now, the variation of these marks between individuals and the genetic cause of this variation was uncharacterized. Moreover, the causal impact of these marks remains unknown. Do they alter gene expression directly or are they altered by gene regulation? Therefore, the two guiding questions for this study are:

1. What genetic variants influence histone modifications?

2. Are these modifications “a cause or a consequence of gene regulation?”

Variation in histone modifications, a real whodunit

The authors first seek to identify and characterize genetic variants that influence histone marks. They generated ChIP-Seq data for the four histone marks and Pol II in LCLs derived from ten unrelated Yoruba individuals who were previously genotyped as part of the 1000 Genomes Project. Similar studies of regulatory variants such as eQTL studies require large sample sizes to detect the effects of regulatory variants that often lie outside the gene. Unlike eQTL studies, histone marks cover fairly broad regions often encompassing causal regulatory variants. As a result, the authors can use a smaller sample size and still be confident about interrogating the effects of causal regulatory SNPs. The authors developed a statistical test that models total read depth between individuals and allelic imbalance between haplotypes within individuals to increase power to detect cis-QTLs (i.e. variants that affect histone marks and Pol II occupancy nearby in the genome). Using this method, they identified over 1200 distinct QTLs for histone marks and Pol II occupancy (FDR 20%).

The authors then analyze these histone mark and Pol II QTLs to determine the overlap of these variants with other known regulatory variants. The hypothesis is that regulatory variants that affect gene expression will have effects on diverse molecular phenotypes. Therefore, variants that influence histone marks and Pol II should show significant overlap with known regulatory variants such as eQTLs and DNase I sensitivity QTLs (dsQTLs). DNase I sensitivity is a measure of chromatin accessibility with higher sensitivity associated with higher accessibility. The Pritchard Lab mapped eQTLs and dsQTLs in a larger sample of ~75 Yoruban LCLs in two previous studies that I also recommend reading [5,6]. Their analysis revealed an enrichment of low p-values for dsQTLs and, to a lesser extent, eQTLs when tested as histone mark and Pol II QTLs. In addition, the authors observed a coordinated change in multiple molecular phenotypes at dsQTLs and eQTLs. For example, higher levels of the three histone active marks were observed at dsQTLs for the more DNase I sensitive genotype. At eQTLs, H3K4me3, H3K27ac, and Pol II levels were higher for individuals with the high expression genotype. These results show that non-coding regulatory variants impact multiple molecular phenotypes ranging from chromatin accessibility and transcription to histone modifications. The authors provide strong evidence in response to their first guiding question, namely that non-coding regulatory polymorphisms associate with variation in histone marks and Pol II.

TFs and a question of directionality

The authors then turned to addressing the questions of causality for these marks. To do so, they analyze genetic variants in TF binding sites. The main hypothesis is that regulatory variants that alter a TFBS will modify TF binding which will cause changes in histone mark and Pol II levels nearby. If this is the case, then changes in histone marks are a consequence of how strong the TF binding site is. On the other hand, if these marks were causal, polymorphisms in TF binding sites would not be expected to show strong association with changes in these marks.

To test their hypothesis, the authors examine ~11.5K TF binding sites with polymorphisms heterozygous in at least 1 of their 10 individuals. They calculate the change in position weight matrix (PWM) score between the two alleles for polymorphic TF binding sites within each individual. They then test for significant association between this change in PWM and allelic imbalance of ChIP-Seq reads at nearby heterozygous sites. The idea is that if a variant improves (or disrupts) TF binding for one allele at a TF binding site then active histone marks nearby on the same allele will increase (or decrease). Repressive histone marks (in this case H3K27me3) are expected to have the opposite response. Indeed, when they apply their test, they find a significant positive association for the active marks and a negative association for the repressive mark. This result supports the hypothesis of changes histone marks as a consequence of TF binding and gene regulation. However, this result does not rule out other possibilities. Histone marks can still play a causal role in the establishment of TF binding. In other words, the relationship between TF binding and histone marks does not have to be unidirectional. In addition, there is evidence that long non-coding RNAs may play a role in the establishment and regulation of histone marks.

dsQTLs and eQTLs, a match made on chromatin

In their final analysis, the authors examine dsQTLs that are also eQTLs. Since these variants associate with both gene expression and chromatin accessibility at distal regulatory regions (>5kb from associated TSS), the authors can assign the regulatory region to a specific gene. A variant that is both a dsQTL and an eQTL likely disrupts a distal regulatory region. In addition to disrupting the accessibility of the regulatory region, the variant also perturbs the expression of a gene influenced by the regulatory region. For example, a variant may decrease the chromatin accessibility of an enhancer region and thereby decrease the level of active histone marks for the enhancer. This decreased enhancer activity can result in decreased transcription from a nearby gene and similarly decreased active mark levels for the gene. Therefore, the hypothesis guiding this analysis is that variants influencing the histone marks of a distal regulatory region will have a coordinated effect on histone marks at genes under the control of the regulatory region. The authors examine the allelic imbalance in ChIP-Seq reads at regulatory regions and their associated transcription start sites (TSS). Indeed, the authors observe that variants that increase DNase I sensitivity have a significant positive allelic imbalance for active marks at both the regulatory region and the TSS. The opposite is true for the repressive mark. This result again emphasizes the complexity of gene regulation and the impact of non-coding variation. Not only do regulatory variants influence diverse molecular phenotypes nearby, they can direct changes at distal loci. As the authors note, this coordinated change in histone marks between distal regions possibly reflects the 3D organization of chromatin. Regulatory variants that impact chromatin looping interactions between distal regulatory regions and genes may cause changes in activity levels for both the gene and the regulatory region.


This paper provides clear evidence that regulatory variation has very complex impacts affecting multiple and diverse molecular phenotypes at multiple regions simultaneously. This complexity implies potentially numerous and diverse mechanisms by which regulatory variants act on gene regulation. The authors set out to find evidence for one of these mechanisms, namely perturbation of TF binding sites. They begin by showing that variation in histone modifications has a strong genetic basis and that the polymorphisms influencing these marks overlap with known regulatory variants such as eQTLs. They then show that polymorphisms in TF binding sites associate with changes in histone marks, providing evidence for directionality in the relationship between these marks and gene regulation. In essence, their results suggest that histone modifications are directed, at least in part, by TF binding. Finally, they find that regulatory variants can have an impact on the molecular phenotypes of distal regions.

I found this paper, as well as the other three previously mentioned, to be quite interesting. I think these papers show that our understanding of gene regulation is still very simplistic. With the advent of high-throughput molecular assays like ChIP-Seq and DNase-Seq, we can begin to interrogate the complex role of regulatory variation on many phenotypes. In doing so, it is of primary interest to ask questions regarding directionality. How do a given set of molecular phenotypes relate? Do these phenotypes represent a cause or a consequence of genome function? How do the diverse elements of gene regulation function together to build complex phenotypes?


[1] Kilpinen, H., Waszak, S. M., Gschwind, A. R., Raghav, S. K., Witwicki, R. M., Orioli, A., Dermitzakis, E. T., et al. (2013). Coordinated Effects of Sequence Variation on DNA Binding, Chromatin Structure, and Transcription. Science (New York, N.Y.), 744. doi:10.1126/science.1242463

[2] McVicker, G., van de Geijn, B., Degner, J. F., Cain, C. E., Banovich, N. E., Raj, A., Pritchard, J. K., et al. (2013). Identification of Genetic Variants That Affect Histone Modifications in Human Cells. Science (New York, N.Y.), 747. doi:10.1126/science.1242429

[3] Kasowski, M., Kyriazopoulou-Panagiotopoulou, S., Grubert, F., Zaugg, J. B., Kundaje, A., Liu, Y., Snyder, M., et al. (2013). Extensive Variation in Chromatin States Across Humans. Science (New York, N.Y.), 750. doi:10.1126/science.1242510

[4] Heinz, S., Romanoski, C. E., Benner, C., Allison, K. a, Kaikkonen, M. U., Orozco, L. D., & Glass, C. K. (2013). Effect of natural genetic variation on enhancer selection and function. Nature, 503(7477), 487–492. doi:10.1038/nature12615

[5] Pickrell, J. K., Marioni, J. C., Pai, A. a, Degner, J. F., Engelhardt, B. E., Nkadori, E., Pritchard, J. K., et al. (2010). Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature, 464(7289), 768–72. doi:10.1038/nature08872

[6] Degner, J. F., Pai, A. a, Pique-Regi, R., Veyrieras, J.-B., Gaffney, D. J., Pickrell, J. K., Pritchard, J. K., et al. (2012). DNase I sensitivity QTLs are a major determinant of human expression variation. Nature, 482(7385), 390–4. doi:10.1038/nature10808

Paper author Jonathan Pritchard is a professor in the Departments of Genetics and Biology.

Paper author Jonathan Pritchard is a professor in the Departments of Genetics and Biology.