Caught in the act: how drug-resistance mutations sweep through populations of HIV

Blog author Meredith Carpenter is a postdoc in Carlos Bustamante's lab.

Blog author Meredith Carpenter is a postdoc in Carlos Bustamante’s lab.

It has been over 30 years since the emergence of HIV/AIDS, yet the disease continues to kill over one million people worldwide per year [UNAIDS report]. One of the reasons that this epidemic has been so difficult to control is because HIV evolves quickly—it has a short replication time and a high mutation rate, so viruses harboring new mutations that confer drug resistance tend to arise often and spread quickly.

However, the likelihood of one of these beneficial mutations popping up and subsequently “sweeping” through the viral population—i.e., becoming more common because of the survival advantage—also depends on the underlying population genetics, much of which is still poorly understood. In a paper just published in PLoS Genetics, Pleuni Pennings, postdoc in the Petrov lab, and colleagues Sergey Kryazhimskiy and John Wakeley from Harvard tracked the genetic diversity in adapting populations of HIV to better understand how and when new mutations arise.

Mutations and populations

Mutations are usually caused by either DNA damage (e.g., from environmental factors like UV radiation) or by a mistake during DNA replication. Because HIV is a retrovirus, meaning it must copy its RNA genome into DNA before it can be reproduced in the host cell, it is especially prone to errors that happen during the replication process. The rate that these errors occur, also called the mutation rate, is constant on a per-virus basis —for example, a specific mutation might happen in one virus in a million. As a consequence, the overall number of viruses in the population determines how many new mutations will be present, with a larger population harboring more mutations at any given time.

Whether these mutations will survive, however, is related to what population geneticists call the “effective population size” (also known as Ne), which takes into account genetic diversity. Due to a combination of factors, including the purely random destruction of some viruses, not all mutations will be preserved in the population, regardless of how beneficial they are. The Ne is a purely theoretical measure that can tell us how easily and quickly a new mutation can spread throughout a population. Because it accounts for factors that affect diversity, it is usually smaller than the actual (or “census”) population size.

Pennings and colleagues wanted to determine the Ne for HIV in a typical patient undergoing drug treatment. This is a contentious area: previous researchers examining this question using different methods, including simply summing up overall mutation numbers, came up with estimates of Ne ranging from one thousand to one million (in contrast, the actual number of virus-producing cells in the body is closer to one hundred million, but more on that later). To get a more exact estimate, Pennings took a new approach. Using previously published DNA sequences of HIV sampled from patients over the course of a drug treatment regimen, she looked at the actual dynamics of the development of drug-resistant virus populations over time.

Swept away

Specifically, Pennings focused on selective sweeps, wherein an advantageous mutation appears and then rises in frequency in the population. Features of these sweeps can give estimates of Ne because they reveal information about the diversity present in the initial population. Pennings sought to distinguish between “hard” and “soft” selective sweeps occurring as the viruses became drug resistant. A hard sweep occurs when a mutation appears in one virus and then rises in frequency, whereas a soft sweep happens when multiple viruses independently gain different mutations, which again rise in frequency over time (see Figure 1). These two types of sweeps have distinct fingerprints, and their relative frequencies depend on the underlying effective population size—soft sweeps are more likely when a population is larger it becomes more likely for different beneficial mutations to independently arise in two different viruses. Soft sweeps also leave more diversity in the adapted population compared to hard sweeps (Figure 1).

Figure 1, an illustration of a hard sweep (left) and a soft sweep (right).

Figure 1, an illustration of a hard sweep (left) and a soft sweep (right).

To tell these types of sweeps apart, Pennings took advantage of a specific amino acid change in the HIV gene that encodes reverse transcriptase (RT). This change can result from two different nucleotide changes, either one of which will change the amino acid from lysine to asparagine and confer resistance to drugs that target the RT protein.  Pennings used this handy feature to identify hard and soft sweeps: if she observed both mutations in the same drug-resistant population, then the sweep was soft. If only one mutation was observed, the sweep could be soft or hard, so she also factored in diversity levels to tell these apart. Pennings found evidence of both hard and soft sweeps in her study populations. Based on the frequencies of each, she estimated the Ne of HIV in the patients. Her estimate was 150,000, which is higher than some previous estimates but still lower than the actual number of virus-infected cells in the body. Pennings suggests that this discrepancy could be due to the background effects of other mutations in the viruses that gain the drug-resistance mutation—that is, even if a virus gets the valuable resistance mutation, it might still end up disappearing from the population because it happened to harbor some other damaging mutation as well. This would reduce the effective population size as measured by selective sweeps.

Implications and future work

Pennings’ findings have several implications. The first is that HIV populations have a limited supply of resistance mutations, as evidenced by the presence of hard sweeps (which, remember, occur when a sweep starts from a single mutation). This means that even small reductions in Ne, such as those produced by combination drug therapies, could have a big impact on preventing drug resistance. The second relates to the fact that, as described above, the likelihood that a mutation will sweep the population may be affected by background mutations in the virus in which it appears. This finding suggests that mutagenic drugs, given in combination with standard antiretrovirals, could be particularly useful for reducing drug resistance.  Now, Pennings is using larger datasets to determine whether some types of drugs lead to fewer soft sweeps (presumably because they reduce Ne). She is also trying to understand why drug resistance in HIV evolves in a stepwise fashion (one mutation at a time), even if three drugs are used in combination.

Paper author Pleuni Pennings is a postdoc in the lab of Dmitri Petrov.

Paper author Pleuni Pennings is a postdoc in the lab of Dmitri Petrov.


Pennings, PS, Kryazhimskiy S, Wakeley J. Loss and recovery of genetic diversity in adapting HIV populations. 2014, PLoS Genetics.

We sequence dead people

Blog-author: Sandeep Venkataram is a grad student in Dmitri Petrov’s lab.

By Sandeep Venkataram – Modern humans have been evolving independently of our nearest living relatives, chimpanzees, for over 7 million years. To study our evolutionary history since this divergence, our major source of information is from the fossilized remains of our ancestors and closely related species such as Neanderthals. Physiological information from the remains can tell us a lot about human evolution, but the majority of the information is locked up in the tiny amounts of highly degraded and fragmented DNA left from the specimen.

Studying ancient DNA (aDNA) from bones is extremely challenging. Each sample contains not only the fossil’s DNA, but contaminating DNA from enormous numbers of microbes and other organisms. There is also the possibility of human contamination from handling the fossil. Since the contaminants have much higher quality DNA than the endogenous sample, simply processing a raw DNA extract of the sample typically yields sequencing results with less than 1% aDNA. Sequencing such a sample is incredibly inefficient, as less than 1% of the sequence is actually useful and only the best funded studies can afford the sequencing costs to generate a high coverage genome. Therefore, most aDNA studies tend to focus on mitochondrial DNA using targeted capture methods, or PCR amplicon sequencing. This reduces wasted sequence capacity, but greatly limits the amount of information obtained from the sample.

Getting the most out of ancient DNA

Meredith Carpenter, who is a postdoc here at Stanford, and her colleagues have developed a novel method called whole genome in-solution capture (WISC) to purify aDNA from next-gen sequencing libraries generated from fossil DNA samples. The authors make use of RNA bait technology (Figure 1), which uses RNA complementary to the targeted DNA that has been synthesized using biotinylated nucleotides (Gnirke et al 2009). The RNA can be hybridized to the DNA pool, and purified from solution by linking the biotin to commercially available purification beads. By then eluting the hybrids from the beads and removing the RNA from solution, one can greatly enrich for the targeted DNA. Previous methods using RNA baits required synthesizing DNA strands on microarrays that are complementary to the RNA, then inducing transcription in vitro to generate the RNA bait library. This methodology has been successfully used to capture the entirety of chromosome 21 (Fu et al 2013) from human fossils, but is not cost effective for producing a bait library that covers the entire human genome.

Meredith Carpenter and coauthors circumvent this by mechanically fragmenting human genomic DNA and use blunt end ligation to attach adapter sequences containing an RNA polymerase promoter. This modified DNA library can then be used to generate the biotinylated RNA bait library via in vitro transcription, after which the RNA library is purified using standard protocols. The authors also generate non-biotinylated RNA complementary to the adapter sequences present on every DNA fragment in the library, to block nonspecific binding to the adapters. The bait and block RNA libraries are hybridized to the aDNA libraries, purified using beads to select only the aDNA fragments and sequenced. The cost of their enrichment method is estimated at $50 per sample, and is accessible to most labs conducting aDNA genomic studies.

WISC greatly enriches for ancient DNA across a variety of samples

The authors tested this method on a variety of aDNA libraries prepared from both high quality and low quality samples, including hair remains, teeth and bones, and fossils from tropical and more temperate regions, which can greatly influence DNA quality. They sequenced the libraries both before and after WISC, and found a 3-13x increase in the number of uniquely mapped reads after using WISC. In addition, most of the unique reads in the enriched library are sequenced with five million reads in both hair and bone samples. WISC allows most of the endogenous sequence to be read from dozens of aDNA samples in a single lane of Illumina HiSeq, opening the possibility of sequencing the millions of fossils in museums and collections around the world.

Dr. Carpenter says they are now focusing on adapting the method to removing human DNA contamination from microbiome sequencing projects with promising preliminary results, as well as applications in forensics and studying extinct species. As WISC generates the RNA bait library from genomic DNA of an extant relative instead of synthetic DNA arrays, bait libraries can be prepared regardless of whether the genome of the organism that is the source of the bait library is known. Combined with recent advances in aDNA library construction methods (Meyer et al 2012), WISC promises to make sequencing of contaminated and degraded samples widely accessible.

Carpenter et al (2013) Figure 1. Schematic of the Whole-Genome In-Solution Capture Process
To generate the RNA “bait” library, a human genomic library is created via adapters containing T7 RNA polymerase promoters (green boxes). This library is subjected to in vitro transcription via T7 RNA polymerase and biotin-16-UTP (stars), creating a biotinylated bait library. Meanwhile, the ancient DNA library (aDNA “pond”) is prepared via standard indexed Illumina adapters (purple boxes). These aDNA libraries often contain <1% endogenous DNA, with the remainder being environmental in origin. During hybridization, the bait and pond are combined in the presence of adaptor-blocking RNA oligos (blue zigzags), which are complimentary to the indexed Illumina adapters and thus prevent nonspecific hybridization between adapters in the aDNA library. After hybridization, the biotinylated bait and bound aDNA is pulled down with streptavidin-coated magnetic beads, and any unbound DNA is washed away. Finally, the DNA is eluted and amplified for sequencing.

Paper author Meredith Carpenter

Paper author Meredith Carpenter is a postdoc in Carlos Bustamante’s lab.


Carpenter, M. L., Buenrostro, J. D., Valdiosera, C., Schroeder, H., Allentoft, M. E., Sikora, M., Rasmussen, M., et al. (2013). Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries. The American Journal of Human Genetics, 1–13. doi:10.1016/j.ajhg.2013.10.002

Fu, Q., Meyer, M., Gao, X., Stenzel, U., Burbano, H. a, Kelso, J., & Pääbo, S. (2013). DNA analysis of an early modern human from Tianyuan Cave, China. Proceedings of the National Academy of Sciences of the United States of America, 110(6), 2223–7. doi:10.1073/pnas.1221359110

Gnirke, A., Melnikov, A., Maguire, J., Rogov, P., LeProust, E. M., Brockman, W., Fennell, T., et al. (2009). Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature biotechnology, 27(2), 182–9. doi:10.1038/nbt.1523

Meyer, M., Kircher, M., Gansauge, M.-T., Li, H., Racimo, F., Mallick, S., Schraiber, J. G., et al. (2012). A high-coverage genome sequence from an archaic Denisovan individual. Science (New York, N.Y.), 338(6104), 222–6. doi:10.1126/science.1224344

Update: The second paragraph originally talked about fossils, but it should have been bones (now corrected). A fossil is mineralized and would not yield aDNA.