Biochemical constraints on genes involved in early embryonic development.

Sandeep Venkataram is a graduate student in the Petrov lab.

Post author Sandeep Venkataram is a graduate student in the Petrov lab.

Chemical reactions form the foundation of life, yet such elementary activities are rarely considered when trying to understand higher-level processes, such as embryonic development. Nevertheless, as recently shown by Artieri and Fraser (MBE 2014), limitations on the kinetics of gene expression strongly constrain the length of highly expressed transcripts during early embryonic development of fruit flies. Furthermore, this phenomenon appears to be a general feature of fruit fly development as it is evolutionarily conserved across a number of species.

The long and short of mRNA transcription

It has long been known that only a portion of the mRNA molecules are used to produce functional proteins – multicellular species contain many long ‘introns’, which must first be transcribed, then spliced out before translation can occur. Introns can be very long, causing transcription of some mRNA molecules to take significant amounts of time: for example, one 2.3 million bp transcript in humans takes over half a day to be produced. This creates a problem as incompletely transcribed mRNA molecules are degraded when DNA is replicated at the beginning of cell division, and the process must begin anew once division is completed. Together, this implies that cell divisions need to be spaced out long enough apart from each other to produce all of the transcripts necessary for the growth of the cell before the next division occurs.

Studies of fruit fly development have shown that zygotes undergo “syncytial division” at the beginning of development, where the DNA within the zygotic nuclei divide every ~10 minutes for 9 cycles, followed by 4 additional progressively lengthening divisions. While most mRNA in the cell at this time are supplied by the mother (maternal mRNA), this also represents the phase during which the zygote begins producing its own mRNA. The extremely rapid cell divisions led Artieri and Fraser to hypothesize that long mRNA molecules transcribed from the zygotic genome may be underrepresented during these early stages of development. Maternal mRNAs, on the other hand, would be unaffected as they are already present in the cell and do not have to be transcribed.

Transcript length vs. developmental timing

The authors classified embryonically expressed genes as “maternal” or “zygotic” depending on whether or not the gene was present as maternal mRNA in unfertilized embryos using published data. They then obtained multiple developmental mRNA expression timecourses and found that long zygotically expressed genes took longer to reach maximal expression levels than short genes – consistent with their inability to be fully transcribed during early development (Figure 1). Furthermore, they were able to use total RNA expression data to detect the presence of incomplete transcripts, indicating that delay was not due to later transcriptional activation, but rather the incomplete production of transcripts.

Modified from Artieri and Fraser 2014 Figure 2B . Long zygotic genes are underexpressed early in the syncytial division phase relative to short genes, but catch up in expression by the end of the syncytial phase while maternally derived transcripts show no such changes.

Figure 1. Long zygotic genes are underexpressed early in the syncytial division phase relative to short genes, but catch up in expression by the end of the syncytial phase while maternally derived transcripts show no such changes. [Modified from Artieri and Fraser 2014 Figure 2B . ]

Using a published set of developmental mRNA expression timecourses from additional Drosophila species, Artieri and Fraser show that these patterns are consistent across all species examined. Finally, they also observed that the introns present in highly expressed zygotic genes appear to be highly evolutionarily constrained in terms of their lengths when compared to either genes maternally deposited or zygotically expressed during later timepoints. This suggests that natural selection has played a role in limiting the expansion of introns in early expressed zygotic genes, allowing them to escape ‘intron delay’.


In summary, Artieri and Fraser have found evidence that a significant fraction of zygotically expressed transcripts in fruit flies are delayed from reaching their maximal levels of expression due to the rapid cell cycles taking place at the beginning of development. This suggests a simple mechanism for developmental timing of zygotic gene expression: genes that are required early must be short, while genes whose expression is needed at a later time can delay their expression via the presence of long introns. While some evidence for the use of intron length as a regulatory mechanism has recently emerged (Takashima et al. 2011), future experiments will be required to determine how widespread is the effect of selection to maintain long lengths and delayed expression.


Carlo G. Artieri and Hunter B. Fraser Transcript Length Mediates Developmental Timing of Gene Expression Across Drosophila. (2014) Molecular Biology and Evolution doi:10.1093/molbev/msu226

Takashima Y, Ohtsuka T, González A, Miyachi H, Kageyama R. Intronic delay is essential for oscillatory expression in the segmentation clock. Proc Natl Acad Sci U S A. 2011;108:3300-3305.

Paper author Carlo Artieri is a postdoctoral fellow in the Fraser lab.

Paper author Carlo Artieri is a postdoctoral fellow in the Fraser lab.


We sequence dead people

Blog-author: Sandeep Venkataram is a grad student in Dmitri Petrov’s lab.

By Sandeep Venkataram – Modern humans have been evolving independently of our nearest living relatives, chimpanzees, for over 7 million years. To study our evolutionary history since this divergence, our major source of information is from the fossilized remains of our ancestors and closely related species such as Neanderthals. Physiological information from the remains can tell us a lot about human evolution, but the majority of the information is locked up in the tiny amounts of highly degraded and fragmented DNA left from the specimen.

Studying ancient DNA (aDNA) from bones is extremely challenging. Each sample contains not only the fossil’s DNA, but contaminating DNA from enormous numbers of microbes and other organisms. There is also the possibility of human contamination from handling the fossil. Since the contaminants have much higher quality DNA than the endogenous sample, simply processing a raw DNA extract of the sample typically yields sequencing results with less than 1% aDNA. Sequencing such a sample is incredibly inefficient, as less than 1% of the sequence is actually useful and only the best funded studies can afford the sequencing costs to generate a high coverage genome. Therefore, most aDNA studies tend to focus on mitochondrial DNA using targeted capture methods, or PCR amplicon sequencing. This reduces wasted sequence capacity, but greatly limits the amount of information obtained from the sample.

Getting the most out of ancient DNA

Meredith Carpenter, who is a postdoc here at Stanford, and her colleagues have developed a novel method called whole genome in-solution capture (WISC) to purify aDNA from next-gen sequencing libraries generated from fossil DNA samples. The authors make use of RNA bait technology (Figure 1), which uses RNA complementary to the targeted DNA that has been synthesized using biotinylated nucleotides (Gnirke et al 2009). The RNA can be hybridized to the DNA pool, and purified from solution by linking the biotin to commercially available purification beads. By then eluting the hybrids from the beads and removing the RNA from solution, one can greatly enrich for the targeted DNA. Previous methods using RNA baits required synthesizing DNA strands on microarrays that are complementary to the RNA, then inducing transcription in vitro to generate the RNA bait library. This methodology has been successfully used to capture the entirety of chromosome 21 (Fu et al 2013) from human fossils, but is not cost effective for producing a bait library that covers the entire human genome.

Meredith Carpenter and coauthors circumvent this by mechanically fragmenting human genomic DNA and use blunt end ligation to attach adapter sequences containing an RNA polymerase promoter. This modified DNA library can then be used to generate the biotinylated RNA bait library via in vitro transcription, after which the RNA library is purified using standard protocols. The authors also generate non-biotinylated RNA complementary to the adapter sequences present on every DNA fragment in the library, to block nonspecific binding to the adapters. The bait and block RNA libraries are hybridized to the aDNA libraries, purified using beads to select only the aDNA fragments and sequenced. The cost of their enrichment method is estimated at $50 per sample, and is accessible to most labs conducting aDNA genomic studies.

WISC greatly enriches for ancient DNA across a variety of samples

The authors tested this method on a variety of aDNA libraries prepared from both high quality and low quality samples, including hair remains, teeth and bones, and fossils from tropical and more temperate regions, which can greatly influence DNA quality. They sequenced the libraries both before and after WISC, and found a 3-13x increase in the number of uniquely mapped reads after using WISC. In addition, most of the unique reads in the enriched library are sequenced with five million reads in both hair and bone samples. WISC allows most of the endogenous sequence to be read from dozens of aDNA samples in a single lane of Illumina HiSeq, opening the possibility of sequencing the millions of fossils in museums and collections around the world.

Dr. Carpenter says they are now focusing on adapting the method to removing human DNA contamination from microbiome sequencing projects with promising preliminary results, as well as applications in forensics and studying extinct species. As WISC generates the RNA bait library from genomic DNA of an extant relative instead of synthetic DNA arrays, bait libraries can be prepared regardless of whether the genome of the organism that is the source of the bait library is known. Combined with recent advances in aDNA library construction methods (Meyer et al 2012), WISC promises to make sequencing of contaminated and degraded samples widely accessible.

Carpenter et al (2013) Figure 1. Schematic of the Whole-Genome In-Solution Capture Process
To generate the RNA “bait” library, a human genomic library is created via adapters containing T7 RNA polymerase promoters (green boxes). This library is subjected to in vitro transcription via T7 RNA polymerase and biotin-16-UTP (stars), creating a biotinylated bait library. Meanwhile, the ancient DNA library (aDNA “pond”) is prepared via standard indexed Illumina adapters (purple boxes). These aDNA libraries often contain <1% endogenous DNA, with the remainder being environmental in origin. During hybridization, the bait and pond are combined in the presence of adaptor-blocking RNA oligos (blue zigzags), which are complimentary to the indexed Illumina adapters and thus prevent nonspecific hybridization between adapters in the aDNA library. After hybridization, the biotinylated bait and bound aDNA is pulled down with streptavidin-coated magnetic beads, and any unbound DNA is washed away. Finally, the DNA is eluted and amplified for sequencing.

Paper author Meredith Carpenter

Paper author Meredith Carpenter is a postdoc in Carlos Bustamante’s lab.


Carpenter, M. L., Buenrostro, J. D., Valdiosera, C., Schroeder, H., Allentoft, M. E., Sikora, M., Rasmussen, M., et al. (2013). Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries. The American Journal of Human Genetics, 1–13. doi:10.1016/j.ajhg.2013.10.002

Fu, Q., Meyer, M., Gao, X., Stenzel, U., Burbano, H. a, Kelso, J., & Pääbo, S. (2013). DNA analysis of an early modern human from Tianyuan Cave, China. Proceedings of the National Academy of Sciences of the United States of America, 110(6), 2223–7. doi:10.1073/pnas.1221359110

Gnirke, A., Melnikov, A., Maguire, J., Rogov, P., LeProust, E. M., Brockman, W., Fennell, T., et al. (2009). Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature biotechnology, 27(2), 182–9. doi:10.1038/nbt.1523

Meyer, M., Kircher, M., Gansauge, M.-T., Li, H., Racimo, F., Mallick, S., Schraiber, J. G., et al. (2012). A high-coverage genome sequence from an archaic Denisovan individual. Science (New York, N.Y.), 338(6104), 222–6. doi:10.1126/science.1224344

Update: The second paragraph originally talked about fossils, but it should have been bones (now corrected). A fossil is mineralized and would not yield aDNA.