ResearchHub | Open Science Community

The Simons Genome Diversity Project: 300 genomes from 142 diverse populations

Swapan Mallick et al.Sep 20, 2016

Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans. Deep whole-genome sequencing of 300 individuals from 142 diverse populations provides insights into key population genetic parameters, shows that all modern human ancestry outside of Africa including in Australasians is consistent with descending from a single founding population, and suggests a higher rate of accumulation of mutations in non-Africans compared to Africans since divergence. Three international collaborations reporting in this issue of Nature describe 787 high-quality genomes from individuals from geographically diverse populations. David Reich and colleagues analysed whole-genome sequences of 300 individuals from 142 populations. Their findings include an accelerated estimated rate of accumulation of mutations in non-Africans compared to Africans since divergence, and that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans but from the same source as that of other non-Africans. Eske Willerlsev and colleagues obtained whole-genome data for 83 Aboriginal Australians and 25 Papuans from the New Guinea Highlands. They estimate that Aboriginal Australians and Papuans diverged from Eurasian populations 51,000–72,000 years ago, following a single out-of-Africa dispersal. Luca Pagani et al. report on a dataset of 483 high-coverage human genomes from 148 populations worldwide, including 379 new genomes from 125 populations. Their analyses support the model by which all non-African populations derive most of their genetic ancestry from a single recent migration out of Africa, although a Papuan contribution suggests a trace of an earlier human expansion.

Genetics

Ecology

0

Paper

Save

Identifying Personal Genomes by Surname Inference

Melissa Gymrek et al.Jan 17, 2013

Anonymity Compromised The balance between maintaining individual privacy and sharing genomic information for research purposes has been a topic of considerable controversy. Gymrek et al. (p. 321 ; see the Policy Forum by Rodriguez et al. ) demonstrate that the anonymity of participants (and their families) can be compromised by analyzing Y-chromosome sequences from public genetic genealogy Web sites that contain (sometimes distant) relatives with the same surname. Short tandem repeats (STRs) on the Y chromosome of a target individual (whose sequence was freely available and identified in GenBank) were compared with information in public genealogy Web sites to determine the shortest time to the most recent common ancestor and find the most likely surname, which, when combined with age and state of residency identified the individual. When STRs from 911 individuals were used as the starting points, the analysis projected a success rate of 12% within the U.S. male population with Caucasian ancestry. Further analysis of detailed pedigrees from one collection revealed that families of individuals whose genomes are in public repositories could be identified with high probability.

Genetics

History

0

Paper

Save

The Histone Deacetylase SIRT6 Is a Tumor Suppressor that Controls Cancer Metabolism

Carlos Sebastián et al.Dec 1, 2012

Reprogramming of cellular metabolism is a key event during tumorigenesis. Despite being known for decades (Warburg effect), the molecular mechanisms regulating this switch remained unexplored. Here, we identify SIRT6 as a tumor suppressor that regulates aerobic glycolysis in cancer cells. Importantly, loss of SIRT6 leads to tumor formation without activation of known oncogenes, whereas transformed SIRT6-deficient cells display increased glycolysis and tumor growth, suggesting that SIRT6 plays a role in both establishment and maintenance of cancer. By using a conditional SIRT6 allele, we show that SIRT6 deletion in vivo increases the number, size, and aggressiveness of tumors. SIRT6 also functions as a regulator of ribosome metabolism by corepressing MYC transcriptional activity. Lastly, Sirt6 is selectively downregulated in several human cancers, and expression levels of SIRT6 predict prognosis and tumor-free survival rates, highlighting SIRT6 as a critical modulator of cancer metabolism. Our studies reveal SIRT6 to be a potent tumor suppressor acting to suppress cancer metabolism.

Genetics

Epidemiology

0

Paper

Save

EWS-FLI1 Utilizes Divergent Chromatin Remodeling Mechanisms to Directly Activate or Repress Enhancer Elements in Ewing Sarcoma

Nicolò Riggi et al.Oct 30, 2014

The aberrant transcription factor EWS-FLI1 drives Ewing sarcoma, but its molecular function is not completely understood. We find that EWS-FLI1 reprograms gene regulatory circuits in Ewing sarcoma by directly inducing or repressing enhancers. At GGAA repeat elements, which lack evolutionary conservation and regulatory potential in other cell types, EWS-FLI1 multimers induce chromatin opening and create de novo enhancers that physically interact with target promoters. Conversely, EWS-FLI1 inactivates conserved enhancers containing canonical ETS motifs by displacing wild-type ETS transcription factors. These divergent chromatin-remodeling patterns repress tumor suppressors and mesenchymal lineage regulators while activating oncogenes and potential therapeutic targets, such as the kinase VRK1. Our findings demonstrate how EWS-FLI1 establishes an oncogenic regulatory program governing both tumor survival and differentiation.

Genetics

Molecular Biology

0

Paper

Save

Abundant contribution of short tandem repeats to gene expression variation in humans

Melissa Gymrek et al.Dec 7, 2015

The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10-15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.

Genetics

Molecular Biology

0

Paper

Save

Combinatorial Patterning of Chromatin Regulators Uncovered by Genome-wide Location Analysis in Human Cells

Oren Ram et al.Dec 1, 2011

Hundreds of chromatin regulators (CRs) control chromatin structure and function by catalyzing and binding histone modifications, yet the rules governing these key processes remain obscure. Here, we present a systematic approach to infer CR function. We developed ChIP-string, a meso-scale assay that combines chromatin immunoprecipitation with a signature readout of 487 representative loci. We applied ChIP-string to screen 145 antibodies, thereby identifying effective reagents, which we used to map the genome-wide binding of 29 CRs in two cell types. We found that specific combinations of CRs colocalize in characteristic patterns at distinct chromatin environments, at genes of coherent functions, and at distal regulatory elements. When comparing between cell types, CRs redistribute to different loci but maintain their modular and combinatorial associations. Our work provides a multiplex method that substantially enhances the ability to monitor CR binding, presents a large resource of CR maps, and reveals common principles for combinatorial CR function.

Genetics

Molecular Biology

0

Paper

Save

lobSTR: A short tandem repeat profiler for personal genomes

Melissa Gymrek et al.Apr 20, 2012

Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat STR mapping as gapped alignment, which results in cumbersome processing times and a biased sampling of STR alleles. Here, we present lobSTR, a novel method for profiling STRs in personal genomes. lobSTR harnesses concepts from signal processing and statistical learning to avoid gapped alignment and to address the specific noise patterns in STR calling. The speed and reliability of lobSTR exceed the performance of current mainstream algorithms for STR profiling. We validated lobSTR's accuracy by measuring its consistency in calling STRs from whole-genome sequencing of two biological replicates from the same individual, by tracing Mendelian inheritance patterns in STR alleles in whole-genome sequencing of a HapMap trio, and by comparing lobSTR results to traditional molecular techniques. Encouraged by the speed and accuracy of lobSTR, we used the algorithm to conduct a comprehensive survey of STR variations in a deeply sequenced personal genome. We traced the mutation dynamics of close to 100,000 STR loci and observed more than 50,000 STR variations in a single genome. lobSTR's implementation is an end-to-end solution. The package accepts raw sequencing reads and provides the user with the genotyping results. It is written in C/C++, includes multi-threading capabilities, and is compatible with the BAM format.

Genetics

Molecular Biology

0

Paper

Save

Private and sub-family specific mutations of founder haplotypes in the BXD family reveal phenotypic consequences relevant to health and disease

David Ashbrook et al.Apr 21, 2022

Abstract The BXD recombinant inbred (RI) mouse strains are the largest and most deeply phenotyped inbred panel of vertebrate organisms. RIs allow phenotyping of isogenic individuals across virtually any environment or treatment. We performed whole genome sequencing and generated a compendium of SNPs, indels, short tandem repeats, and structural variants in these strains and used them to analyze phenomic data accumulated over the past 50 years. We show that BXDs segregate >6 million variants with high minor allele which are dervied from the C57BL/6J and DBA/2J founders and use this dense variant set to define ‘infinite’ marker maps and a novel family-level pangenome. We additionally characterize rates and spectra de novo variants which have accumulated over 20-200 generations of inbreeding, and have largely been ignored previously. Overall, the uniquely rich phenome when linked with WGS enables a new type of integrative modeling of genotype-to-phenotype relations.

Genetics

Demography

3

Paper

Save

Quantitative analysis of population-scale family trees using millions of relatives

Joanna Kaplanis et al.Feb 7, 2017

Abstract Family trees have vast applications in multiple fields from genetics to anthropology and economics. However, the collection of extended family trees is tedious and usually relies on resources with limited geographical scope and complex data usage restrictions. Here, we collected 86 million profiles from publicly-available online data from genealogy enthusiasts. After extensive cleaning and validation, we obtained population-scale family trees, including a single pedigree of 13 million individuals. We leveraged the data to partition the genetic architecture of longevity by inspecting millions of relative pairs and to provide insights to population genetics theories on the dispersion of families. We also report a simple digital procedure to overlay other datasets with our resource in order to empower studies with population-scale genealogical data. One Sentence Summary Using massive crowd-sourced genealogy data, we created a population-scale family tree resource for scientific studies.

Genetics

History

0

Paper

Save

Polymorphic short tandem repeats make widespread contributions to blood and serum traits

Jonathan Margoliash et al.Aug 3, 2022

Abstract Short tandem repeats (STRs), genomic regions each consisting of a sequence of 1-6 base pairs repeated in succession, represent one of the largest sources of human genetic variation. However, many STR effects are not captured well by standard genome-wide association studies (GWAS) or downstream analyses that are mostly based on single nucleotide polymorphisms (SNPs). To study the involvement of STRs in complex traits, we imputed genotypes for 445,720 autosomal STRs into genotype array data from 408,153 White British UK Biobank participants and tested for association with 44 blood and serum biomarker phenotypes. We used two fine-mapping methods, SuSiE and FINEMAP, to identify 119 high-confidence STR-trait associations across 93 unique STRs predicted as causal variants under all fine-mapping settings tested. Using these results, we estimate that STRs account for 5.2-7.6% of causal variants identifiable from GWAS signals for these traits. Our high confidence STR-trait associations implicate STRs in some of the strongest hits for multiple phenotypes, including a CTG repeat in APOB associated with circulating apolipoprotein B levels, a CGG repeat in the promoter of CBL associated with multiple platelet traits and a poly-A repeat in TAOK1 associated with mean platelet volume. Replication analyses in additional population groups and orthogonal expression data further support the role of a subset of the candidate STRs we identify. Together, our study suggests that polymorphic tandem repeats make widespread contributions to complex traits, provides a set of stringently selected candidate causal STRs, and demonstrates the need to routinely consider a more complete view of human genetic variation in GWAS.

Genetics

Cellular And Molecular Neuroscience

0

Paper

Genetics

11

0

Save