Why the Y?

Blog author Amy Goldberg is a graduate students in Noah Rosenberg’s lab.
While mitochondria have been extensively sequenced for decades because of their short length and abundance, the Y chromosome has been under-studied. Unlike autosomal DNA, the mitochondria and (most of) the Y chromosome are inherited exclusively maternally and paternally, respectively. Therefore, they do not undergo meiotic recombination. Without recombination, mutations accumulate on a stable background, preserving a wealth of information about population history. Each background, shared through a common ancestor, is called a haplogroup. To leverage this information, Poznik et al. set out to sequence 69 males from nine diverse human populations, including a large representation of African individuals. The paper, published in Science last summer, is by Stanford graduate student David Poznik and a group lead by CEHG professor Dr. Carlos Bustamante.
The structure of the Y chromosome is complex, with large heterochromatic regions, pseudo-autosomal regions that recombine with the X chromosome, and repetitive elements, making mapping reads difficult. But, the Y chromosome is haploid, allowing for accurate variant calls at lower coverage than the autosomes, which have heterozygotes. Using high-throughput sequencing (3.1x mean coverage) and a haploid expectation-maximization algorithm, Poznik et al. called genotypes with an error rate around 0.1%. The paper developed important methods for analyzing high-throughput sequences of the difficult Y chromosome, including determining the subset of regions within which accurate genotypes can be called.
Reconstructing the human Y-chromosome tree
Poznik et al. constructed a phylogenetic tree of the Y chromosome using sequence data and a maximum likelihood approach. While the overall structure of the tree was known, Poznik et al. were able to accurately calculate branch lengths based on the number of variants differing between individuals and resolve previously indeterminate finer structure.
Incredible African Diversity: One of the key findings of the paper was the depth of diversity within Africans lineages. While both uniparental and autosomal markers have indicated an African root for human diversity, Poznik et al. find lineages within a single population, the San hunter-gatherers, that coalesce almost at the same time as the entire tree (see haplogroup A). This indicates African diversity and structure has existed for tens of thousands of years, and there is likely more to discover. A large sample of African populations were considered, which lead to previously unseen structure within haplogroup B2, including structure not mirrored by modern population clustering, that dates to approximately 35,000 years ago.
Evidence of population expansion: Short internal branches of the tree, such as those seen within haplogroup E and the non-African group FT, indicate periods of rapid population growth. When a population expands quickly, new variants that might otherwise drift to extinction can persist. A large number of coalescence events occur at the time of growth, as there were fewer lineages alive in the population before this time. For non-African haplogroups, this pattern is likely a remnant of the Out of Africa migration. For haplogroup E, this corresponds to the Bantu agricultural expansion.
Resolved Eurasian polytomy: Previously, the topology of the Eurasian tree separating haplogroups G-H-IJK was unresolved. Because of the higher coverage sequencing for this study, Poznik et al. found a single variant, a C to T transition, that differentiates G from the other groups. Haplogroup G retains the ancestral variant, while H-IJK share the derived variant and are therefore more closely related to each other.
Sequencing vs. genotyping
In contrast to previous studies, which analyzed small repetitive elements called microsatellites or small sets of single base-pair changes called SNPs, whole-genome sequencing data contains not only more information, but potentially more accurate information. In particular, before the advent of high-throughput sequencing, SNPs were usually ascertained in a subset of individuals that did not capture worldwide diversity levels. Therefore, diversity measures are often underestimated and biased. Without sequence data, the branch lengths of the tree did not have a meaningful interpretation, and the depth of variation within Africa was not seen.
MRCA of Human Maternal and Paternal Lineages
There was a lot of public discussion spurred by the publication of Poznik’s paper last year. The discussion mainly focused on their result that, contrary to previous estimates, the most recent common ancestor (MRCA) of all mitochondrial DNA lived at a similar time as that of all Y chromosomes. Previous estimates put the mitochondrial TMRCA around 200 thousand years ago, with the Y chromosome coalescing a bit over 100 thousand years ago. These different estimates for Y and mitochondria were often obtained through different sequencing and analysis methods, and are therefore less comparable. In particular, varying estimates of the mutation rates have led to different TMRCA estimates. By analyzing both the Y and mitochondria in the same framework, calibrated by archeological evidence and within-species comparisons, Poznik et al. found largely overlapping confidence intervals for the TMRCA of both Y and mitochondria.
But, should the coalescence times of the mitochondria and the Y chromosome be the same? Not necessarily. While discrepancies between the mitochondria and Y chromosome have often been interpreted as sex-biased population histories or sizes, strictly neutral models can predict large differences between the two, as well. Because neither the analyzed part of the Y chromosome nor the mitochondria undergo recombination, each acts as a single locus – and therefore represents the history of a single lineage. For a population, there is a wide distribution of the ages when lineages would coalesce for a given population history, and these loci represent only two with largely independent histories (given the overall population history), therefore they may differ by chance alone. Similarly, different loci across autosomal DNA have TMRCA ranging from thousands to millions of years. Additionally, as single loci, any effects of selection would distort the entire genealogy of the Y chromosome and mitochondria.
Future directions
Human population history is far from fully fleshed out, and Poznik et al. provide a framework to leverage increasingly available high-throughput sequencing of Y chromosomes. The method used to calculate the mutation rate and TMRCA is a valuable contribution in itself, with applications to a wide range of evolutionary and ecological questions. This study demonstrated that we have only characterized a fraction of worldwide diversity, particularly in Africa, and that increased sampling will be critical to parsing close and far ties in human history.