A Snapshot of #bigdatamed tweets and Facebook Posts

CLICK HERE FOR FEED

As you can see from the curated Twitter and Facebook feed included at the link above, Stanford’s Center for Computational, Evolutionary and Human Genomics (CEHG) had a strong presence at this year’s Big Data Conference. In addition to the numerous CEHG community members in the audience (and livestreaming the event), we also had CEHG faculty members Russ Altman, Euan Ashley, Carlos Bustamante, Mildred Cho, Hank Greely, Susan Holmes, and Julia Salzman on the stage, serving as session moderators and featured speakers

We hope you enjoy this Storification of Facebook posts and tweets posted before, during, and after the event. Please note: this stream is not comprehensive, but is rather a snapshot of the conversations surrounding each event, identified by the hashtag #bigdatamed. Want to learn more about CEHG? Follow us on Facebook and @StanfordCEHG and read our blog at stanfordcehg.wordpress.com. Want more info about Big Data in Biomedicine? For the detailed agenda, videos of 2015 conference videos, and more, click here.

Screen Shot 2015-05-28 at 2.04.37 PM

 

 

BAPGXII Saturday May 30, 2015

logo with APG-p19lu5oeag1iikbhs1s351gbf18j8Stanford is hosting the 12th Bay Area Population Genomics (BAPG) meeting. The Bay Area Population Genomics meeting is a great place to (re)connect with your pop gen/genomics colleagues in the area and to present your work in a talk or a poster.

BAPGXI, held in December at UC Davis, was a great event with over 100 participants and a line up of excellent talks. Thanks to the Coop lab! You can read more here, including the storified tweets. We are excited to continue this success at Stanford!

Logistics

UPDATE: Click here for detailed event program.

The meeting will take place on May 30th on the Stanford campus in the Alway building, room M106. We start at 8:30AM with breakfast and registration, Dr. Dmitri Petrov’s opening remarks will begin at 9:25am, and the first talk will be at 9:30am. The last talk (Dr. Jonathan Pritchard’s keynote) ends at 2:10pm, followed by a poster session with amazing wine, beer, and cheese! Here is a general outline of the agenda, to help you plan your day:

Breakfast and Registration in Alway Courtyard 8:30-9:25am (pick up your BAPGXII gift!)
Opening Remarks 9:25-9:30am
Talk Session 1 9:30-10:30am (20 mins per talk)
Coffee Break in Courtyard 10:30-11am
Talk Session 2 11am-12pm (20 mins per talk)
Lunch in Courtyard 12-1pm
Talk Session 3 and Keynote 1-2:10pm (2 20 min talks and 1 30 min talk)
Poster Session with Wine, Beer, and Cheese Reception at 2:10pm, ends at 3pm

Talks and Posters

Sorry. Speaker and poster slots are now full. No longer accepting sign-ups.

How to Attend BAPGXII

1. Please register here by 10am Friday, May 29th to join us at BAPGXII. Registration is free and open to all, but required.

2. Encourage your colleagues to sign up! Forward this email to your lab mailing list and watch for updates on the CEHG Facebook page and on Twitter @StanfordCEHG. Help us get the momentum going by tweeting us using #BAPGXII.

3. And finally, once you’ve signed up, all you need to do is get up early and ride-share, VTA/Caltrain or bike to our beautiful campus on May 30th. Come for the science, stay for the social! Use the Stanford campus map and this Google Map to find the Alway Building, located at 500 Pasteur Drive, Stanford, CA. Be green and consider ride-sharing: there is a dedicated tab for making travel plans in the sign up doc!

We hope to see you at Stanford!

The BAPGXII organizing committee: Bridget Algee-Hewitt (@BridgetAH), David Enard (@DavidEnard), Katie Kanagawa (@KatieKanagawa), Alison Nguyen, Dmitri Petrov (@PetrovADmitri), Susanne Tilk, and Elena Yujuico. If you have any questions, feel free to contact Bridget Algee-Hewitt at bridgeta@stanford.edu.

To follow BAPGXII on twitter, check out the hashtag: #BAPGXII and also follow @StanfordCEHG .

Imagining Phylogenetics and Recombination as Art

DAFAbout the Artist:
Daniel Friedman is a 1st year Ph.D. student in the Ecology and Evolution program. Working from the Gordon lab, he mainly studies the evolution of collective behavior in ants. Other research interests include fractals, burritos, and metaphors. Contact: dfri@stanford.edu.

 

1. Phylogenetics
Ever since “I Think…”, the idea of a bifurcating tree of species relations has guided evolutionary biology. This piece of paper with ink on it plays with the idea of an “evolutionary I”, styled as an evolving Eye. Whether we perform molecular studies on the ontogenetic role of Pax6, or psychophysical explorations into the Self, we are confronted with questions of homology and convergence. Our time-reversible phylogenetic algorithms, so designed for computational simplicity, only contribute to this problem of post hoc ergo propter hoc – “after this, therefore because of this.” The Modern Synthesis was clearly a rEvolutionary moment – now are we ready for a Post-Modern Synthesis?
14542394682_ac7d30e52c_z
2. Recombination
DNA recombination is key to many biological processes. Recombination between homologous chromosomes during meiosis creates novel combinations of alleles, and to many, is the teleological “Why?” of Sex. But the reach of recombination goes far, far beyond Sex. Recombination between alleles of the same locus allows a kaleidoscope of DNA error-correcting mechanisms to proceed. And over evolutionary time scales, “errors” in recombination provide large structural creativities in the genome, such as duplication, deletions, and inversions. Recombination during immune cell maturation allows the human body to recognize an essentially infinite cohort of potential invaders. And now that recombination has been mechanistically deconvoluted, derived technologies facilitate guided DNA editing in vitro and in vivo . Recombination is molecular innovation embodied, a topological whirligig, and the workhorse of the genome.
14169607682_41db8c9213_z

Fast Algorithm Infers Relationships in High-Dimensional Datasets

Post author Henry Li is a graduate student in the Wong Lab.

Post author Henry Li is a graduate student in the Wong Lab.

New research harnesses the powers of single value decomposition (SVD) and sparse learning to tackle the problem of inferring relationships between predictors and responses in large-scale, high-dimensional datasets.

Addressing problems in computation speed, assumptions of scarcity, and algorithm sensitivity

One major challenge that statisticians face when inferring relationships is that modern data is big and the underlying true relationships between predictors and responses are sparse and multilayered. To quickly establish connections in these datasets, Ma et al. utilize a combination of SVD and sparse learning, called thresholding SVD (T-SVD). This new algorithm solves many issues that plagued the Statistics and Big Data communities, such as the problems of computation speed, the assumption of sparcity, and the sensitivity of the algorithm to positive results. In their simulation study, T-SVD is shown to be better in relation to speed and sensitivity than existing methods such as the sequential extraction algorithm (SEA) and the iterative exclusive extraction algorithm (IEEA). As a result, the multilayered relationships between predictors and responses, which come in the form of multidimensional matrices, can be learned quickly and accurately.

Uncovering new regulatory networks

Demonstrating the application of T-SVD, Ma et al. showed that new biological insights can be gained from using T-SVD to analyze datasets from The Cancer Genome Atlas consortium. The authors focused on the ovarian cancer gene expression datasets, in which the sample size is much smaller than the number of regulators and responses measured in the study. As in a typical genomic experiment, tens of thousands of genes were probed for their expression levels; from pathway studies, we know that very few of these genes form control switches that govern the expression levels for the rest of the genome. Ma et al. inferred two different relationships, based on microRNA (miRNA) or long noncoding RNA (lncRNA). The authors showed that these regulatory relationships specifically match established cancer pathways very well. Geneticists now have two new regulatory networks to mine for understanding the roles of miRNAs and lncRNAs.

In short, T-SVD is an exciting algorithm that pushes the Statistics field forward by offering a new lens to look at large-scale multidimensional datasets. With this approach, statisticians and users of statistics, like geneticists, can gain new insights into existing datasets and tackle new research problems.

References

Ma, Xin, Luo Xiao, Wing Hung Wong. Learning regulatory programs by threshold SVD regression. Proc Natl Acad SCI USA. 2014 Nov 4; 111 (44). DOI 10.1073/pnas.1417808111

Paper author, Xin (Maria) Ma is a research associate in the Wong Lab.

Paper author, Xin (Maria) Ma is a research associate in the Wong Lab.

Afterword: CEHG Genetics and Society Symposium 2015

CEHG_Logo_Mono_Black

Founded in 2012, CEHG is a research program that fosters interdisciplinary research. Home to more than 25 faculty and more than 200 grads and postdocs, CEHG bridges the divides between various member labs across Stanford campus.

The 2015 CEHG Genetics and Society Symposium (GSS15), which took place on April 13th and 14th in Stanford’s Paul Brest Hall, was a smashing success. It featured 25 speakers from Stanford campus and the San Francisco Bay academic and scientific industry communities. Approximately 175 Stanford affiliates and non-affiliates came together to celebrate the Center’s spirit of interdisciplinary collaboration and meet with experts in the fields of computational, evolutionary and human genomics This is a significant increase from last year’s 150 attendees!

The Mission:

The Genetics and Society Symposium is integral to CEHG’s mission: it provides postdocs and graduate fellows with the opportunity to share their developing research with faculty advisors and their colleagues, encourages conversation between faculty working in diverse scientific disciplines across campus, and introduces CEHG members to speakers from around the Bay Area and beyond (and vice versa).

The Venue:

As you can see from our photos of the space and catering service, Paul Brest Hall was the perfect home for this year’s two-day symposium. The hall was spacious, the food delicious, the staff hands on, and the outdoor picnic area well suited for our lunch and coffee breaks. We enjoyed the venue so much, in fact, that CEHG staff are currently in the process of booking the space for next year!

The Speakers:

GSS15 featured four brilliant keynote speakers, each distinguished in his/her field of research.

Gene Myers and CEHG Exec Committee members Marc Feldman, Chiara Sabatti, and Carlos Bustamante

Gene Myers and CEHG Exec Committee members Marc Feldman, Chiara Sabatti, and Carlos Bustamante

Founding director of a new Systems Biology Center at the Max-Planck Institute of Molecular Cell Biology and Genetics, Dr. Eugene (Gene) Myers presented his open-sourced research on the resurrection of de novo DNA sequencing. Best known for the development of BLAST, the most widely used tool in bioinformatics and the assembler he developed at Celera that delivered the fly, human, and mouse genomes in a three-year period, Dr. Myers participated in GSS15, courtesy of DNAnexus. Follow his blog: https://github.com/thegenemyers.

Co-founding director Carlos Bustamante and Ed Green catch up during a break at GSS15.

Co-founding director Carlos Bustamante and Ed Green catch up during a break at GSS15.

Assistant Professor in Biomolecular Engineering at the University of California, Santa Cruz, Richard (Ed) Green presented his research on a novel approach for highly contiguous genome assemblies, which draws on his work as an NSF Fellow at the Max Planck Institute in Leipzig, Germany and head of an analysis consortium responsible for publishing the draft genome sequence of Neanderthal. Click here for his 2014 CARTA talk, “The Genetics of Humanness: The Neanderthal and Denisovan Genomes.

Dr. Michelle Mello, Stanford Law School and School of Medicine

Dr. Michelle Mello, Stanford Law School and School of Medicine

Michelle Mello, Professor of Law at Stanford Law School and Professor of Health Research and Policy in Stanford’s School of Medicine, presented findings from her extensive research on the ethics of data sharing. As the author of more than 140 articles and book chapters on the medical malpractice system, medical errors and patient safety, public health law, research ethics, the obesity epidemic, and pharmaceuticals, Dr. Mello provided a valuable perspective from the intersections of law, ethics, and health policy. Click here to read Dr. Mello’s SLS profile.

Dr. Ami Bhatt, Stanford Medicine

Dr. Ami Bhatt, Stanford Medicine

Ami Bhatt shared her passion for improving outcomes for patients with hematological malignancies in her talk, “Bugs, drugs, and cancer.” Best known for her recent work demonstrating the discovery of a novel bacterium using sequence-based analysis of a diseased human tissue, her research has been presented nationally and internationally and published in 2013 in the New England Journal of Medicine. Click here for links to Dr. Bhatt’s CAP profile and lab homepage.

 

We had a large group of CEHG faculty members at this year’s event, showcasing the cutting edge research being done in CEHG labs across Stanford campus and indicating considerable faculty commitment to ensuring the Center’s continuing success.

Our symposium would not be complete without our invited CEHG Fellows. These speakers were nominated by organizing committee members to present on topics relating to their CEHG-funded research projects. These young scholars embody CEHG’s continuing commitment to provide funding support to researchers as they transition from graduate studies to postdoctoral scholarships.

The Workshop:

There was standing room only when facilitators Chiara Sabatti (Associate Professor of Health Research and Policy at Stanford), Ken Lange (Chair of the Human Genetics Department at UCLA), and Suyash Shringarpure (postdoctoral scholar in Stanford’s Bustamante Lab) presented their approaches to contemporary problems in statistical genetics!

Social Media:

Did you know? CEHG is on social media!

GSS15 social media moderators, Bridget Algee-Hewitt, Jeremy Hsu, Katie Kanagawa, and Rajiv McCoy were posting live throughout both days of the event. And our efforts to reach the larger community paid off, with a total reach of 815 on Facebook and more than 7,000 impressions on Twitter!

To catch up on our GSS15 coverage, check out our Facebook page at https://www.facebook.com/StanfordCEHG?ref=hl and our Twitter feed @StanfordCEHG. Follow both to make sure you are the first to know when we post CEHG-related news and announcements.

Want to know when speaker videos from the symposium will be available on CEHG’s forthcoming youtube channel? Follow us on Facebook and Twitter!

Special Thanks:

From left to right: Bridget Algee-Hewitt, Cody Sam, Yang Li, Anand Bhaskar, and Katie Kanagawa

From left to right: Bridget Algee-Hewitt, Cody Sam, Yang Li, Anand Bhaskar, and Katie Kanagawa

The GSS15 organizing committee—including Bridget Algee-Hewitt, Anand Bhaskar, Katie Kanagawa, Yang Li, and Cody Sam—would like to take this opportunity to thank CEHG Directors Carlos Bustamante and Marc Feldman, Executive Committee members Hank Greely, Dmitri Petrov, Noah Rosenberg, and Chiara Sabatti, event volunteers Alex Adams, Maude David, and Chris Gignoux, event photographer Deneb Semprum, and everyone who attended this year’s symposium.

We hope you enjoyed attending as much as we enjoyed working behind-the-scenes. We hope to see you all again at GSS16! If you are interested in volunteering for future CEHG events, please contact us at stanfordcehg@stanford.edu.

Upcoming CEHG events:

Don’t miss our popular weekly Evolgenome seminar series, which will continue through Spring term, usually on Wednesdays at noon (location varies). Lunch is always provided. Details will follow, but here is a quick overview so you can mark your calendars!

April 29: Fernando Racimo (Nielsen/Slatkin Lab)
May 6: Pleuni Pennings (UCSF)
May 20: Kelly Harkin
June 3: Sandeep Ventakaram (Petrov Lab)
June 10: Emilia Huerta-Sanchez

Link to Stanford News Center Press Release: “Centuries old DNA helps identify specific origins of slave skeletons found in Caribbean”

Maria Avila-Arcos, postdoctoral CEHG scholar and Bustamante Lab member

Maria Avila-Arcos, postdoctoral CEHG scholar and Bustamante Lab member

Greetings CEHG community!

Click on the link below to read more about this fascinating collaboration between CEHG postdoctoral member Maria Avila-Arcos, CEHG faculty member Dr. Carlos Bustamante, Hannes Schroeder and Thomas Gilbert (both from the University of Copenhagen), and Bustamante lab members David Poznik and Martin Sikora.

http://med.stanford.edu/news/all-news/2015/03/ancient-dna-helps-identify-specific-origins-of-slave-skeletons.html

As Krista Conger, science writer for Stanford Medical School’s Office of Communication & Public Affairs, writes about this groundbreaking study, “The research marks the first time that scientists have been able to use such old, poorly preserved DNA to identify with high specificity the ethnic origins of long-dead individuals. The finding paves the way for a greater understanding of the patterns of the trans-Atlantic slave trade, and may transform the general practice of genealogical and historical research.”

The paper was released online March 9th in the Proceedings of the National Academy of Sciences. Please let me know if you would be interested in writing a blog post in response.

New CEHG blog editor and call for blog submissions!

Hello everyone,

It is my honor and privilege to edit and moderate the CEHG blog. As editor, I will continue to approach the blog as an invaluable opportunity to encourage interactions between CEHG member labs on Stanford campus and to showcase the science that is done in CEHG to others in the CEHG community and to the world outside of CEHG and Stanford University.

I am currently (and perpetually) looking for some volunteers to write posts for the CEHG blog (https://stanfordcehg.wordpress.com/). I would like to extend an invitation to those of you who would like to write in a similar style as the previous posts, summarizing and responding to some of the recently published CEHG articles listed at the end of this letter. If you want someone to write about your own article that’s coming out soon, pleased let me know and I will add your upcoming publication to the list. CEHG-authored reviews of non-CEHG authored books/articles are also welcome.

I would also like to invite contributions from past and present CEHG community members that might not be written in a similar style as previous posts, but that still serve to deepen interactions among CEHG member labs on the Stanford campus and showcase relevant research in the areas of computational, evolutionary, or human genomics. I hope you will let your imagination run wild in considering the possible formats (formal and informal) in which future posts might appear. Possibilities include, but are certainly not limited to:

  • CEHG-authored responses to current events in genetics-related news
  • CEHG-authored responses to scientific conferences attended
  • CEHG-produced creative projects, including photography, artwork, creative writing, comic strips, videos, music, a regular column, etc.

If you are interested in communicating science to the growing CEHG community and the public, let me know!

Thank you very much and I look forward to speaking with you,

Katie M. Kanagawa
Administrative Associate, Bustamante Lab
Communications Coordinator, CEHG
Littlefield 315, Third Floor
kkanagaw@stanford.edu
(650)497-4382

 

Potential papers (from last ~3 months)

“Parente2: a fast and accurate method for detecting identity by descent” (Batzoglou lab).
http://www.ncbi.nlm.nih.gov/pubmed/25273070

“A systematic assessment of linking gene expression with genetic variants for prioritizing candidate targets” (Butte lab). http://www.ncbi.nlm.nih.gov/pubmed/25592598

“Translating personalized medicine using new genetic technologies in clinical practice: the ethical issues” (Butte lab)
http://www.ncbi.nlm.nih.gov/pubmed/25221608

“Echoes of the past: hereditarianism and a troublesome inheritance” (Marcus Feldman)
http://www.ncbi.nlm.nih.gov/pubmed/25502763

“Stability depends on positive autoregulation in Boolean gene regulatory networks” (Feldman lab)
http://www.ncbi.nlm.nih.gov/pubmed/25375153

“Molecular diagnosis of bird-mediated pest consumption in tropical farmland” (Hadly lab)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4216319/

“SWAMP: Sliding Window Alignment Masker for PAML” (Montgomery Lab)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4251194/

“PATH-SCAN: A reporting tool for identifying clinically actionable variants” (Montgomery Lab)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4008882/

“Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics” (Quake lab)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4273416/

“Taxonomic and functional diversity provides insight into microbial pathways and stress responses in the Saline Qinghai Lake, China” (Quake lab)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4218802/

“Autosomal admixture levels are informative about sex bias in admixed populations” (Rosenberg lab)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4224161/

“Learning regulatory programs by threshold SVD regression” (Wong lab)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4226119/

CEHG goes to Puerto Rico for SMBE (posters)

SMBE2014-430x140

 

The SMBE meeting is one of the most important evolutionary biology meetings of the year. This year (2014) it takes place in Puerto Rico from June 9th till June 12th. The Stanford Center for Computational, Evolutionary and Human Genomics (CEHG) sponsors the event. We are also well represented with 17 talks, 7 posters and 2 symposia that are (co-)organized by CEHG members. Visit us at Stand 9!

Here is a list of the CEHG posters:

POSTER SESSION 1: P-1001 – P-1278

On Show:
Monday 9th – 13:00 – 15:30 / 17:00 – 17:30
Tuesday 10th – 13:00 – 15:30 / 17:00 – 17:30

Manned Session:
Monday 9th – 19:30 – 21:00
Tuesday 10th – 19:00 – 21:00

Suyash Shringarpure

Suyash Shringarpure

Shringarpure, Suyash
Fast, scalable and distributed dimensionality reduction of genome-wide data
S1 P-1016

Kimberly McManus

Kimberly McManus

Kimberly McManus
popRange: a highly flexible spatially and temporally explicit forward genetic simulator
S2 P-1041

Rajiv McCoy

Rajiv McCoy

McCoyRajiv 
TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly repetitive transposable elements
S5 P-1066

Nilah Ioannidis

Nilah Ioannidis

IoannidisNilah 
Inferring the effects of genetic variants on gene expression and splicing
S11 P-1145

Karla Sandoval

Karla Sandoval

Karla Sandoval
The genetic basis of preeclampsia in populations adapted to high altitude
S12 P-1186

Chris Gignoux

Chris Gignoux

Chris Gignoux
The Role of Human Demographic History in the Identification of Genetic Associations
S17 P-1265

POSTER SESSION 2: P-2001 – P-2279 & U-2280 – U-2289

On Show:
Wednesday 11th – 11:00 – 11:30 / 13:00 – 15:30 / 17:00 – 17:30
Thursday 12th -11:45 – 12:15 / 13:45 – 15:15

Manned Session:
Wednesday 11th – 19:00 – 21:30
Thursday 12th – 17:30 – 18:30

Rajiv McCoy

Rajiv McCoy

McCoy, Rajiv 
Characterizing patterns of human aneuploidy in a large sample of IVF patients
S26 P-2139

Shaila Musharoff

Shaila Musharoff

Musharoff, Shaila 
A Novel Likelihood Ratio Test for Sex-Biased Demography and the Effect of Cryptic Sex-Bias on the Estimation of Demographic Parameters
S26 P-2151

Alicia Martin

Alicia Martin

Martin, Alicia R 
The Genetic Architecture Of Skin Pigmentation In Southern Africa
S34 P-2206

Oana Carja

Oana Carja

Carja, Oana
On the evolution of mutation in spatially subdivided populations
S35 P-2220

Giltae Song

Giltae Song

Giltae Song
Pan genome analysis of Saccharomyces cerevisiae
S39  P-2269

CEHG goes to Puerto Rico for SMBE (talks)

The SMBE meeting is one of the most important evolutionary biology meetings of the year. This year (2014) it takes place in Puerto Rico from June 9th till June 12th. The Stanford Center for Computational, Evolutionary and Human Genomics (CEHG) sponsors the event. We are also well represented with 17 talks, 7 posters and 2 symposia that are (co-)organized by CEHG members. Visit us at Stand 9! Here is a list of the CEHG talks:

Monday June 9th

David Enard

David Enard

David Enard A global landscape of protein adaptation to viruses in mammals Monday 9th June:  S9: Evolutionary Networks 10:00 – 10:15

Tuesday June 10th

Dmitri Petrov

Dmitri Petrov

Dmitri Petrov Balancing selection and maintenance of variation as a natural consequence of adaptation in diploids Tuesday 10th June:  S18: Does ploidy matter? Ploidy impacts on evolutionary process 09:30 – 11:00

Zoe Assaf

Zoe Assaf Staggered sweeps: The obstruction of adaptation in diploids by recessive, strongly deleterious alleles Tuesday 10th June:  S18: Does ploidy matter? Ploidy impacts on evolutionary process 10:00 – 10:15

NicoleCreanza

Nicole Creanza

Nicole Creanza Worldwide linguistic and genetic variation Tuesday 10th June:  S15: Out of Africa: Humans, commensals, pathogens, oh my! 10:00 – 10:30

David Enard

David Enard

David Enard Genome-wide signals of positive selection in human evolution.  Tuesday 10th June:  S12: Genomics of adaptation (cont) 12:15 – 12:30

Alan Bergland

Alan Bergland

Alan Bergland Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila Tuesday 10th June:  S12: Genomics of adaptation 16:15 – 16:30

Ben Wilson

Ben Wilson

Ben Wilson Soft selective sweeps in complex demographic scenarios Tuesday 10th June:  S12: Genomics of adaptation 17:45 – 18:00

Diamantis Sellis

Diamantis Sellis

Diamantis Sellis Widespread heterozygote advantage in diploids Tuesday 10th June:  S12: Genomics of adaptation  18:45 – 19.00

Wednesday June 11th

Morten Rasmussen

Morten Rasmussen

Morten Rasmussen The genome of a Late Pleistocene human from a Clovis burial site in western Montana Wednesday 11th June:  S27: Genomic perspectives on the population history of the Americas 09:30 – 09:45

Maria Avila

Maria Avila

María Ávila-Arcos Tracing the genetic ancestry of enslaved Africans using ancient DNA Wednesday 11th June:  S27: Genomic perspectives on the population history of the Americas 10:45 – 11:00

Andres Moreno Estrada

Andres Moreno Estrada

Andres Moreno Estrada Patterns of genetic diversity in Latin America: insights from human population genomics Wednesday 11th June:  S27: Genomic perspectives on the population history of the Americas 11:30 – 11:45

Philipp Messer

Philipp Messer

Philipp Messer New statistical methods detect both hard and soft sweeps in malaria parasites Wednesday 11th June:  S25: Detecting selection in natural populations: making sense of genome scans and towards alternative solutions 12:30 – 12:45

Gavin Sherlock

Gavin Sherlock

Gavin Sherlock Tracking hundreds of thousands of lineages in an evolving population allows determination of the beneficial mutation rate and elucidation of the distribution of their fitness effects Wednesday 11th June:  S23: Genome-scale Approaches in Experimental Evolution 12:45 – 13:00

Nandita Garud

Nandita Garud

Nandita Garud Disentangling the effects of demography and selection on haplotype signatures in Drosophila. Wednesday 11th June:  S25: Detecting selection in natural populations: making sense of genome scans and towards alternative solutions  16:15 – 16:30

Carlo Artieri

Carlo Artieri

Carlo Artieri Accounting for biases in riboprofiling data identifies a conserved effect of proline incorporation as the major determinant of translational stalling Wednesday 11th June:  S24: Creative use of nest generation sequencing technology in evolutionary genomics: solving old problems with new approaches 18:45 – 19:00

Thursday June 12th

Tomas Babak

Tomas Babak

Tomas Babak An atlas of human and mouse genomic imprinting reveals evolutionary causes and consequences Thursday 12th June:  S36: Evolutionary Epigenomics 13:00 – 13:15

Fernando Mendez

Fernando Mendez

Fernando Mendez Use of Long-Read Sequence-aided phasing to improve ancestry assignment in admixed populations Thursday 12th June:  S39: Next generation Genome Annotation and Analysis 16:15 – 16:30

A framework for identifying and quantifying fitness effects across loci

Blog author Ethan Jewett is a PhD student in the lab of Noah Rosenberg.

Blog author Ethan Jewett is a PhD student in the lab of Noah Rosenberg.

The degree to which similarities and differences among species are the result of natural selection, rather than genetic drift, is a major question in population genetics. Related questions include: what fraction of sites in the genome of a species are affected by selection? What is the distribution of the strength of selection across genomic sites, and how have selective pressures changed over time? To address these questions, we must be able to accurately identify sites in a genome that are under selection and quantify the selective pressures that act on them.

Difficulties with existing approaches for quantifying fitness effects    

A recent paper in Trends in Genetics by David Lawrie and Dmitri Petrov (Lawrie and Petrov, 2014) provides intuition about the power of existing methods for identifying genomic regions affected by purifying selection and for quantifying the selective pressures at different sites. The paper proposes a new framework for quantifying the distribution of fitness effects across a genome. This new framework is a synthesis of two existing forms of analysis – comparative genomic analyses to identify genomic regions in which the level of divergence among two or more species is smaller than expected, and analyses of the distribution of the frequencies of polymorphisms (the site frequency spectrum, or SFS) within a single species (Figure 1). Using simulations and heuristic arguments, Lawrie and Petrov demonstrate that these two forms of analysis can be combined into a framework for quantifying selective pressures that has greater power to identify selected regions and to quantify selective strengths than either approach has on its own.

Figure 1. Using the quantify the strength of purging selection. The SFS tabulates the number of polymorphisms at a given frequency in a sample of haplotypes. Under neutrality (black dots) many high-frequency polymorphisms are observed. Under purifying selection (higher values of the effective selection strength |4Nes|), a higher fraction of new mutations are deleterious, leading to fewer high-frequency polymorphisms (red and blue dots). Adapted from Lawrie and Petrov (2014).

Figure 1. Using the site frequency spectrum (SFS) to quantify the strength of purifying selection. The SFS tabulates the number of polymorphisms at a given frequency in a sample of haplotypes. Under neutrality (black dots) many high-frequency polymorphisms are observed. Under purifying selection (higher values of the effective selection strength |4Nes|), a higher fraction of new mutations are deleterious, leading to fewer high-frequency polymorphisms (red and blue dots). Adapted from Lawrie and Petrov (2014).

Lawrie and Petrov begin by discussing the strengths and weaknesses of the two existing approaches. Comparative analyses of genomic divergence are beneficial for identifying genomic regions under purifying selection, which will exhibit lower-than-expected levels of divergence among species. However, as Lawrie and Petrov note, it can be difficult to use comparative analyses to quantify the strength of selection in a region because even mild purifying selection can result in complete conservation among species within the region (Figure 2). For example, whether the population-scaled selective strength, 4Nes, in a region is 20 or 200, the same genomic signal will be observed, complete conservation.

Figure 1. Adapted from Lawrie and Petrov (2013). The evolution of several 100kb regions was simulated in 32 different mammalian species under varying strengths of selection |4Nes|. The number of substitutions in each region was then estimated using genomic evolutionary rate profiling (GERP). The plot shows the median across regions of the number of inferred substitutions. From the plot, it can be seen that, once the strength of selection exceeds a weak threshold value (3 for the example given), there is full conservation among species.

Figure 1. Adapted from Lawrie and Petrov (2013). The evolution of several 100kb regions was simulated in 32 different mammalian species under varying strengths of selection |4Nes|. The number of substitutions in each region was then estimated using genomic evolutionary rate profiling (GERP). The plot shows the median across regions of the number of inferred substitutions. From the plot, it can be seen that, once the strength of selection exceeds a weak threshold value (3 for the example given), there is full conservation among species.

In contrast to comparative approaches, analyses of within-species polymorphisms based on the site frequency spectrum (SFS) within a region can be used to more precisely quantify the strength of selection. For example, Figure 1 shows that different selective strengths can produce very different site frequency spectra. Moreover, if the SFS can be estimated precisely enough, it can allow us to distinguish between two different selective strengths (e.g., 4Nes1 = 20 and 4Nes2 = 200) that would both lead to total conservation in a comparative study, and would therefore be indistinguishable. The problem is that it takes a lot of polymorphisms to obtain an accurate estimate of the SFS, and a genomic region of interest may contain too few polymorphisms, especially if the region is under purifying selection, which decreases the apparent mutation rate. Sampling additional individuals from the same species may provide little additional information about the SFS because few novel polymorphisms may be observed in the additional sample. For example, recall that for a sample of n individuals from a wildly idealized panmictic species, the expected number of novel polymorphisms observed in the n+1st sampled individual is proportional to 1/n (Watterson1975).

A proposed paradigm

Lawrie and Petrov demonstrate that studying polymorphisms by sampling many individuals across several related species (rather than sampling more individuals within a single species) could increase the observed number of polymorphisms in a region, and therefore, could increase the power to quantify the strength of selection (Figure 3) – as long as the selective forces in the genomic region are sufficiently similar across the different species.

Figure3

Figure 3. The benefits of studying polymorphisms in many populations, rather than within a single population. Three populations (A, B, and C) diverge from an ancestral population, D. The genealogy of a single region is shown (slanted lines) with mutations in the region denoted by orange slashes. Additional lineages sampled in population A are likely to coalesce recently with other lineages (for example, the red clade in population A ) and, therefore, carry few mutations that have not already been observed in the sample. In comparison, the same number of lineages sampled from a second population are likely to carry additional independent polymorphisms (for example, the red lineages in population B). If the selective pressures at the locus in populations A and B are similar, then the SFS in the two populations should be similar, and the additional lineages in B can provide additional information about the SFS. For example, if the demographic histories and selective pressures at the locus are identical in populations A and B, and if the samples from populations A and B are sufficiently diverged, then a sample of K lineages from each population, A and B, will contain double the number of independent polymorphisms that are observed in a sample of K lineages from population A alone, providing double the number of mutations that can be used to estimate the SFS.

The need for sampling depth and breadth

Without getting bogged down in the details, it’s the rare variants that are often the most important for quantifying the effects of purifying selection, so one still has to sample deeply within each species; however, overall, sampling from additional species is a more efficient way of increasing the absolute number of variants that can be used to estimate the SFS in a region, compared with sampling more deeply within the same species.

The simulations and heuristic arguments presented by Lawrie and Petrov consider idealized cases for simplicity; however, the usefulness of approaches that consider polymorphisms across multiple species has been demonstrated in methods such as the McDonald-Kreitman test (McDonald and Kreitman, 1991), which have long been important tools for studying selection. More recent empirical applications of approaches that consider information about polymorphisms across multiple species appear to do a good job of quantifying selective pressures across genomes (Wilson et al., 2011; Gronau et al., 2013), even when species are closely related (De Maio et al., 2013). Overall, the simulations and arguments presented in Lawrie and Petrov’s paper provide useful guidelines for researchers interested in identifying and quantifying selective forces, and their recommendation to sample deeply within species and broadly across many species comes at a time when such analyses are becoming increasingly practical, given the recent availability of sequencing data from many species.

References:

  1. De Maio, N., Schlötterer, C., and Kosiol, C. (2013). Linking great apes genome evolution across time scales using polymorphism-aware phylogenetic models. Molecular biology and evolution30:2249-2262.
  2. Gronau, I., Arbiza, L., Mohammed, J., and Siepel, A. (2013). Inference of natural selection from interspersed genomic elements based on polymorphism and divergence. Molecular biology and evolution30:1159-1171.
  3. Lawrie, D.S. and Petrov, D.A. (2014). Comparative population genomics: power and principles for the inference of functionality. Trends in Genetics30:133-139.
  4. Watterson, G.A. (1975). On the number of segregating sites in genetical models without recombination. Theoretical population biology7:256-276.
  5. Wilson, D.J., Hernandez, R.D., Andolfatto, P., and Przeworski, M. (2011). A population genetics-phylogenetics approach to inferring natural selection in coding sequences. PLoS genetics7:e1002395.

Paper author: David Lawrie was a graduate student in Dmitri Petrov’s lab. He is now a postdoc at USC.