The wealth of genetic variation in the human genome is found not within protein-coding genes but within non-protein coding regions. This comes as no surprise given that only 1% percent of the genome codes for proteins. Until recently, efforts to determine the effects of genetic variation on trait variation and disease have focused on coding regions. Results of genome-wide association studies (GWAS), however, have shown that trait and disease associated variants are often regulatory variants such as expression quantitative trait loci (eQTLs) found in non-coding regions. These results have spurred an effort to understand the functional role of non-coding, regulatory variation. Efforts have thus far relied on characterizing the association between variants and gene expression. This association alone, however, will not reveal the complete functional mechanism by which non-coding variants influence gene expression. Recent efforts have therefore begun to characterize numerous molecular phenotypes such as transcription factor (TF) binding, histone modification, and chromatin state to determine the mechanisms by which regulatory variants affect gene expression.
One issue, four papers
In the November 8 issue of Science, three papers were published that address the role of non-coding genetic variation on TF binding, histone modifications, and chromatin state (i.e. active versus inactive enhancer status). The first study was completed by the Dermitzakis Lab at the University of Geneva. They analyzed three TFs, RNA polymerase II (Pol II), and five histone modifications using chromatin immunoprecipitation and sequencing (ChIP-Seq) in lymphoblastoid cell lines (LCLs) from two parent-child trios . The second was completed by the Pritchard Lab, which has recently moved to Stanford, and the Gilad Lab at the University of Chicago. They identified genetic variants affecting variation in four histone modifications and Pol II occupancy in ten unrelated Yoruba LCLs . The third study was performed by the Snyder Lab at Stanford. They characterized the genetic variation underlying changes in chromatin state using RNA-Seq and ChIP-Seq for four histone modifications and two DNA binding factors in 19 LCLs from diverse populations . This work was the subject of a recent CEHG Evolgenome talk given by Maya Kasowski, the study’s first author. Finally, the fourth study, published in the November 28 issue of Nature, was performed by the Glass Lab at UCSD. They characterized the effect of natural genetic variation between two mouse strains on the binding of two TFs involved in cell differentiation (PU.1 and C/EBPα) using ChIP-Seq . In this post, I will analyze primarily the work presented by the Pritchard Lab, but I strongly recommend reading all four papers to understand the challenges in characterizing non-coding variation and the methods available to do so.
The four studies seek to answer the general question of how regulatory variation affects gene expression. They characterize diverse molecular phenotypes such as histone modifications and TF binding to understand the mechanisms of action for non-coding variants. The Pritchard Lab focused their study on four histone modifications (three active and one repressive: H3K4me3, H3K4me1, H3K27ac, and H3K27me3, respectively) and Pol II occupancy.
Histone modifications 101
Histone modifications refer to the addition of chemical groups such as methyl or acetyl to specific amino acids on the tails of histone proteins comprising the nucleosome. These chemical groups are referred to as histone marks. They can serve a wide range of functions, but in general they are associated with the accessibility of a chromatin region. For example, the tri-methylation of lysine 4 of histone 3 (H3K4me3) is associated with increased chromatin accessibility and gene activation. On the other hand, increased levels of the repressive mark H3K27me3 (tri-methylation of lysine 27 of histone 3) at promoters is associated with gene inactivation.
Histone mark levels are measured in a high-throughput manner using ChIP-Seq. Briefly, an antibody targeting the mark of interest is used to pull down modified genomic regions. These immunoprecipitated regions are then sequenced to determine which genomic segments are modified and at what level. The procedure usually requires a large number of cells (on the order of 10^7). Therefore, the modification level is, in some ways, a population level measurement. Analysis of ChIP-Seq data typically involves testing for genomic regions with more reads than expected by chance. These regions, ranging from 200bp to 1000bp or more, are referred to as peaks that represent a modification level above the genomic background. Repressive marks like H3K27me3 tend to have broad peak regions, while activating marks like H3K4me3 can have much tighter peaks.
Since modification levels represent measurements on a population of cells and histone residues can have multiple modifications, genomic regions can show evidence for multiple marks. The combinations of these marks over a region can mark the function of the region. For example, regions with high levels of H3K27ac and a high ratio of H3K4me1 to H3K4me3 can mark active enhancer regions. Until now, the variation of these marks between individuals and the genetic cause of this variation was uncharacterized. Moreover, the causal impact of these marks remains unknown. Do they alter gene expression directly or are they altered by gene regulation? Therefore, the two guiding questions for this study are:
1. What genetic variants influence histone modifications?
2. Are these modifications “a cause or a consequence of gene regulation?”
Variation in histone modifications, a real whodunit
The authors first seek to identify and characterize genetic variants that influence histone marks. They generated ChIP-Seq data for the four histone marks and Pol II in LCLs derived from ten unrelated Yoruba individuals who were previously genotyped as part of the 1000 Genomes Project. Similar studies of regulatory variants such as eQTL studies require large sample sizes to detect the effects of regulatory variants that often lie outside the gene. Unlike eQTL studies, histone marks cover fairly broad regions often encompassing causal regulatory variants. As a result, the authors can use a smaller sample size and still be confident about interrogating the effects of causal regulatory SNPs. The authors developed a statistical test that models total read depth between individuals and allelic imbalance between haplotypes within individuals to increase power to detect cis-QTLs (i.e. variants that affect histone marks and Pol II occupancy nearby in the genome). Using this method, they identified over 1200 distinct QTLs for histone marks and Pol II occupancy (FDR 20%).
The authors then analyze these histone mark and Pol II QTLs to determine the overlap of these variants with other known regulatory variants. The hypothesis is that regulatory variants that affect gene expression will have effects on diverse molecular phenotypes. Therefore, variants that influence histone marks and Pol II should show significant overlap with known regulatory variants such as eQTLs and DNase I sensitivity QTLs (dsQTLs). DNase I sensitivity is a measure of chromatin accessibility with higher sensitivity associated with higher accessibility. The Pritchard Lab mapped eQTLs and dsQTLs in a larger sample of ~75 Yoruban LCLs in two previous studies that I also recommend reading [5,6]. Their analysis revealed an enrichment of low p-values for dsQTLs and, to a lesser extent, eQTLs when tested as histone mark and Pol II QTLs. In addition, the authors observed a coordinated change in multiple molecular phenotypes at dsQTLs and eQTLs. For example, higher levels of the three histone active marks were observed at dsQTLs for the more DNase I sensitive genotype. At eQTLs, H3K4me3, H3K27ac, and Pol II levels were higher for individuals with the high expression genotype. These results show that non-coding regulatory variants impact multiple molecular phenotypes ranging from chromatin accessibility and transcription to histone modifications. The authors provide strong evidence in response to their first guiding question, namely that non-coding regulatory polymorphisms associate with variation in histone marks and Pol II.
TFs and a question of directionality
The authors then turned to addressing the questions of causality for these marks. To do so, they analyze genetic variants in TF binding sites. The main hypothesis is that regulatory variants that alter a TFBS will modify TF binding which will cause changes in histone mark and Pol II levels nearby. If this is the case, then changes in histone marks are a consequence of how strong the TF binding site is. On the other hand, if these marks were causal, polymorphisms in TF binding sites would not be expected to show strong association with changes in these marks.
To test their hypothesis, the authors examine ~11.5K TF binding sites with polymorphisms heterozygous in at least 1 of their 10 individuals. They calculate the change in position weight matrix (PWM) score between the two alleles for polymorphic TF binding sites within each individual. They then test for significant association between this change in PWM and allelic imbalance of ChIP-Seq reads at nearby heterozygous sites. The idea is that if a variant improves (or disrupts) TF binding for one allele at a TF binding site then active histone marks nearby on the same allele will increase (or decrease). Repressive histone marks (in this case H3K27me3) are expected to have the opposite response. Indeed, when they apply their test, they find a significant positive association for the active marks and a negative association for the repressive mark. This result supports the hypothesis of changes histone marks as a consequence of TF binding and gene regulation. However, this result does not rule out other possibilities. Histone marks can still play a causal role in the establishment of TF binding. In other words, the relationship between TF binding and histone marks does not have to be unidirectional. In addition, there is evidence that long non-coding RNAs may play a role in the establishment and regulation of histone marks.
dsQTLs and eQTLs, a match made on chromatin
In their final analysis, the authors examine dsQTLs that are also eQTLs. Since these variants associate with both gene expression and chromatin accessibility at distal regulatory regions (>5kb from associated TSS), the authors can assign the regulatory region to a specific gene. A variant that is both a dsQTL and an eQTL likely disrupts a distal regulatory region. In addition to disrupting the accessibility of the regulatory region, the variant also perturbs the expression of a gene influenced by the regulatory region. For example, a variant may decrease the chromatin accessibility of an enhancer region and thereby decrease the level of active histone marks for the enhancer. This decreased enhancer activity can result in decreased transcription from a nearby gene and similarly decreased active mark levels for the gene. Therefore, the hypothesis guiding this analysis is that variants influencing the histone marks of a distal regulatory region will have a coordinated effect on histone marks at genes under the control of the regulatory region. The authors examine the allelic imbalance in ChIP-Seq reads at regulatory regions and their associated transcription start sites (TSS). Indeed, the authors observe that variants that increase DNase I sensitivity have a significant positive allelic imbalance for active marks at both the regulatory region and the TSS. The opposite is true for the repressive mark. This result again emphasizes the complexity of gene regulation and the impact of non-coding variation. Not only do regulatory variants influence diverse molecular phenotypes nearby, they can direct changes at distal loci. As the authors note, this coordinated change in histone marks between distal regions possibly reflects the 3D organization of chromatin. Regulatory variants that impact chromatin looping interactions between distal regulatory regions and genes may cause changes in activity levels for both the gene and the regulatory region.
This paper provides clear evidence that regulatory variation has very complex impacts affecting multiple and diverse molecular phenotypes at multiple regions simultaneously. This complexity implies potentially numerous and diverse mechanisms by which regulatory variants act on gene regulation. The authors set out to find evidence for one of these mechanisms, namely perturbation of TF binding sites. They begin by showing that variation in histone modifications has a strong genetic basis and that the polymorphisms influencing these marks overlap with known regulatory variants such as eQTLs. They then show that polymorphisms in TF binding sites associate with changes in histone marks, providing evidence for directionality in the relationship between these marks and gene regulation. In essence, their results suggest that histone modifications are directed, at least in part, by TF binding. Finally, they find that regulatory variants can have an impact on the molecular phenotypes of distal regions.
I found this paper, as well as the other three previously mentioned, to be quite interesting. I think these papers show that our understanding of gene regulation is still very simplistic. With the advent of high-throughput molecular assays like ChIP-Seq and DNase-Seq, we can begin to interrogate the complex role of regulatory variation on many phenotypes. In doing so, it is of primary interest to ask questions regarding directionality. How do a given set of molecular phenotypes relate? Do these phenotypes represent a cause or a consequence of genome function? How do the diverse elements of gene regulation function together to build complex phenotypes?
 Kilpinen, H., Waszak, S. M., Gschwind, A. R., Raghav, S. K., Witwicki, R. M., Orioli, A., Dermitzakis, E. T., et al. (2013). Coordinated Effects of Sequence Variation on DNA Binding, Chromatin Structure, and Transcription. Science (New York, N.Y.), 744. doi:10.1126/science.1242463
 McVicker, G., van de Geijn, B., Degner, J. F., Cain, C. E., Banovich, N. E., Raj, A., Pritchard, J. K., et al. (2013). Identification of Genetic Variants That Affect Histone Modifications in Human Cells. Science (New York, N.Y.), 747. doi:10.1126/science.1242429
 Kasowski, M., Kyriazopoulou-Panagiotopoulou, S., Grubert, F., Zaugg, J. B., Kundaje, A., Liu, Y., Snyder, M., et al. (2013). Extensive Variation in Chromatin States Across Humans. Science (New York, N.Y.), 750. doi:10.1126/science.1242510
 Heinz, S., Romanoski, C. E., Benner, C., Allison, K. a, Kaikkonen, M. U., Orozco, L. D., & Glass, C. K. (2013). Effect of natural genetic variation on enhancer selection and function. Nature, 503(7477), 487–492. doi:10.1038/nature12615
 Pickrell, J. K., Marioni, J. C., Pai, A. a, Degner, J. F., Engelhardt, B. E., Nkadori, E., Pritchard, J. K., et al. (2010). Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature, 464(7289), 768–72. doi:10.1038/nature08872
 Degner, J. F., Pai, A. a, Pique-Regi, R., Veyrieras, J.-B., Gaffney, D. J., Pickrell, J. K., Pritchard, J. K., et al. (2012). DNase I sensitivity QTLs are a major determinant of human expression variation. Nature, 482(7385), 390–4. doi:10.1038/nature10808