ResearchHub | Open Science Community

An integrated map of structural variation in 2,504 human genomes

Peter Sudmant et al.Sep 29, 2015

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association. The Structural Variation Analysis Group of The 1000 Genomes Project reports an integrated structural variation map based on discovery and genotyping of eight major structural variation classes in 2,504 unrelated individuals from across 26 populations; structural variation is compared within and between populations and its functional impact is quantified. The Structural Variation Analysis Group of The 1000 Genomes Project reports an integrated structural variation map based on discovery and genotyping of eight major structural variation classes in genomes for 2,504 unrelated individuals from across 26 populations. They characterize structural variation within and between populations and quantify its functional effect. The authors further create a phased reference panel that will be valuable for population genetic and disease association studies.

Genetics

Demography

0

Paper

Save

The complete genome sequence of a Neanderthal from the Altai Mountains

Kay Prüfer et al.Dec 18, 2013

Genetics

Paleontology

0

Paper

Save

A High-Coverage Genome Sequence from an Archaic Denisovan Individual

Matthias Meyer et al.Sep 1, 2012

We present a DNA library preparation method that has allowed us to reconstruct a high-coverage (30×) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of "missing evolution" in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans.

Genetics

Paleontology

0

Paper

Save

Ancient human genomes suggest three ancestral populations for present-day Europeans

Iosif Lazaridis et al.Sep 1, 2014

A sequencing study comparing ancient and contemporary genomes reveals that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, ancient north Eurasians (related to Upper Palaeolithic Siberians) and early European farmers of mainly Near Eastern origin. By sequencing and comparing the genomes of nine ancient Europeans that bridge the transition to agriculture in Europe between 8,000 and 7,000 years ago, David Reich and colleagues show that most present-day Europeans derive from at least three highly differentiated populations — west European hunter-gatherers, ancient north Eurasians (related to Upper Palaeolithic Siberians) and early European farmers of mainly Near Eastern origin. They further propose that early European farmers had about 44% ancestry from a 'basal Eurasian' population that split before the diversification of other non-African lineages. These results raise interesting new questions, for instance that of where and when the Near Eastern farmers mixed with European hunter-gatherers to produce the early European farmers. We sequenced the genomes of a ∼7,000-year-old farmer from Germany and eight ∼8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes1,2,3,4 with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians3, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations’ deep relationships and show that early European farmers had ∼44% ancestry from a ‘basal Eurasian’ population that split before the diversification of other non-African lineages.

Genetics

History

0

Paper

Save

Great ape genetic diversity and population history

Javier Prado-Martinez et al.Jul 1, 2013

High-coverage sequencing of 79 (wild and captive) individuals representing all six non-human great ape species has identified over 88 million single nucleotide polymorphisms providing insight into ape genetic variation and evolutionary history and enabling comparison with human genetic diversity. In an effort to provide insights into great ape genetic variation, the authors sequence 79 wild- and captive-born individuals from across all six great ape species and seven subspecies. Their data and analyses shed light on population structure and gene flow, inbreeding, inferred dynamics of effective population sizes and the differences in the rate of gene loss among the great apes. This new catalogue of great ape genome diversity provides a valuable resource for evolutionary and conservation studies. Most great ape genetic variation remains uncharacterized1,2; however, its study is critical for understanding population history3,4,5,6, recombination7, selection8 and susceptibility to disease9,10. Here we sequence to high coverage a total of 79 wild- and captive-born individuals representing all six great ape species and seven subspecies and report 88.8 million single nucleotide polymorphisms. Our analysis provides support for genetically distinct populations within each species, signals of gene flow, and the split of common chimpanzees into two distinct groups: Nigeria–Cameroon/western and central/eastern populations. We find extensive inbreeding in almost all wild populations, with eastern gorillas being the most extreme. Inferred effective population sizes have varied radically over time in different lineages and this appears to have a profound effect on the genetic diversity at, or close to, genes in almost all species. We discover and assign 1,982 loss-of-function variants throughout the human and great ape lineages, determining that the rate of gene loss has not been different in the human branch compared to other internal branches in the great ape phylogeny. This comprehensive catalogue of great ape genome diversity provides a framework for understanding evolution and a resource for more effective management of wild and captive great ape populations.

Genetics

Molecular Biology

0

Paper

Save

Resolving the complexity of the human genome using single-molecule sequencing

Mark Chaisson et al.Nov 10, 2014

Single-molecule, real-time DNA sequencing is used to analyse a haploid human genome (CHM1), thus closing or extending more than half of the remaining 164 euchromatic gaps in the human genome; the complete sequences of euchromatic structural variants (including inversions, complex insertions and tandem repeats) are resolved at the base-pair level, suggesting that a greater complexity of the human genome can now be accessed. The human genome is considered sequenced, yet more than 160 euchromatic gaps remain and many aspects of its structural variation are poorly understood. Evan Eichler and colleagues sequenced and analysed a haploid human genome (CHM1) using single-molecule, real-time (SMRT) DNA sequencing and by doing so closed — or in some cases extended — more than half of the remaining gaps. They also resolved the complete sequence of numerous euchromatic structural variants at the base-pair level, revealing inversions, complex insertions and long tracts of tandem repeats, some of them previously unknown. Thanks to the longer-read sequencing technology applied here, the complexity of the human genome that stems from variation of longer and more complex repetitive DNA can now be largely resolved. The human genome is arguably the most complete mammalian reference assembly1,2,3, yet more than 160 euchromatic gaps remain4,5,6 and aspects of its structural variation remain poorly understood ten years after its completion7,8,9. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing10. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome—78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.

Genetics

Molecular Biology

0

Paper

Save

Diversity of Human Copy Number Variation and Multicopy Genes

Peter Sudmant et al.Oct 28, 2010

Evolution, Gene Number, and Disease Slight variations in the numbers of copies of genes influence human disease and other characters. Variants can be hard to detect when they lie in heavily duplicated and widely similar regions of sequence known as “dark matter.” Sudmant et al. (p. 641 ) have methods to tease apart the duplicated regions to reveal singly unique nucleotide identifiers. These have turned out to be among the most variable seen in different human population groups—most notably among genes for neurodevelopment and neurological diseases. Such polymorphisms can be genotyped with specificity and may help us understand how variation in copy number may affect human evolution and disease.

Genetics

Demography

0

Paper

Save

Copy number variation detection and genotyping from exome sequence data

Niklas Krumm et al.May 14, 2012

While exome sequencing is readily amenable to single-nucleotide variant discovery, the sparse and nonuniform nature of the exome capture reaction has hindered exome-based detection and characterization of genic copy number variation. We developed a novel method using singular value decomposition (SVD) normalization to discover rare genic copy number variants (CNVs) as well as genotype copy number polymorphic (CNP) loci with high sensitivity and specificity from exome sequencing data. We estimate the precision of our algorithm using 122 trios (366 exomes) and show that this method can be used to reliably predict (94% overall precision) both de novo and inherited rare CNVs involving three or more consecutive exons. We demonstrate that exome-based genotyping of CNPs strongly correlates with whole-genome data (median r 2 = 0.91), especially for loci with fewer than eight copies, and can estimate the absolute copy number of multi-allelic genes with high accuracy (78% call level). The resulting user-friendly computational pipeline, CoNIFER ( co py n umber i nference f rom e xome r eads), can reliably be used to discover disruptive genic CNVs missed by standard approaches and should have broad application in human genetic studies of disease.

Genetics

Cancer Research

0

Paper

Save

Mountain gorilla genomes reveal the impact of long-term population decline and inbreeding

Yali Xue et al.Apr 9, 2015

Genomes in the mist The mountain gorilla is an iconic species that is at high risk of extinction. Xue et al. have sequenced 13 gorillas from two different populations to probe their genetic diversity. The genomes show large tracts of homozygosity and the loss of highly deleterious genetic variants, indicating population bottlenecks and inbreeding. This loss of genetic diversity appears to have started over 20,000 years ago and may have been caused by changes in climate and human-associated effects. Science , this issue p. 242

Genetics

Ecology

0

Paper

Save

Evolution of Human-Specific Neural SRGAP2 Genes by Incomplete Segmental Duplication

Megan Dennis et al.May 1, 2012

Summary

Gene duplication is an important source of phenotypic change and adaptive evolution. We leverage a haploid hydatidiform mole to identify highly identical sequences missing from the reference genome, confirming that the cortical development gene Slit-Robo Rho GTPase-activating protein 2 (SRGAP2) duplicated three times exclusively in humans. We show that the promoter and first nine exons of SRGAP2 duplicated from 1q32.1 (SRGAP2A) to 1q21.1 (SRGAP2B) ∼3.4 million years ago (mya). Two larger duplications later copied SRGAP2B to chromosome 1p12 (SRGAP2C) and to proximal 1q21.1 (SRGAP2D) ∼2.4 and ∼1 mya, respectively. Sequence and expression analyses show that SRGAP2C is the most likely duplicate to encode a functional protein and is among the most fixed human-specific duplicate genes. Our data suggest a mechanism where incomplete duplication created a novel gene function—antagonizing parental SRGAP2 function—immediately "at birth" 2–3 mya, which is a time corresponding to the transition from Australopithecus to Homo and the beginning of neocortex expansion.

Genetics

Molecular Biology

0

Paper

Genetics

384

0

Save