ResearchHub | Open Science Community

A global reference for human genetic variation

Alexandra Roa et al.Sep 29, 2015

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.

Genetics

Molecular Biology

0

Paper

Save

Comprehensive genomic characterization defines human glioblastoma genes and core pathways

Roger McLendon et al.Sep 4, 2008

Human cancer cells typically harbour multiple chromosomal aberrations, nucleotide substitutions and epigenetic modifications that drive malignant transformation. The Cancer Genome Atlas (TCGA) pilot project aims to assess the value of large-scale multi-dimensional analysis of these molecular characteristics in human cancer and to provide the data rapidly to the research community. Here we report the interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas—the most common type of adult brain cancer—and nucleotide sequence aberrations in 91 of the 206 glioblastomas. This analysis provides new insights into the roles of ERBB2, NF1 and TP53, uncovers frequent mutations of the phosphatidylinositol-3-OH kinase regulatory subunit gene PIK3R1, and provides a network view of the pathways altered in the development of glioblastoma. Furthermore, integration of mutation, DNA methylation and clinical treatment data reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated glioblastomas, an observation with potential clinical implications. Together, these findings establish the feasibility and power of TCGA, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer. The Cancer Genome Atlas, a large-scale genomics project to catalogue cancer-linked mutations, is starting to produce results. Glioblastoma, the most common brain cancer, was the first target for the project and the initial results, published AOP on 4 September, are now in print. Genes newly implicated in glioblastoma include tumour suppressors (NF1, RB1, ATM and APC) and several tyrosine kinase genes. Glioblastoma is extremely resistant to therapy, hence the potential importance of the development of a possible model system. Zheng et al. report that mice lacking the tumour suppressors p53 and Pten develop tumours resembling human glioblastomas, associated with increased Myc protein levels. As well as offering a potential system for testing therapeutics, this points to c-Myc as a possible drug target. With a comprehensive analysis of sequencing data, DNA copy number, gene expression and DNA methylation in a large number of human glioblastomas, The Cancer Genome Atlas project initiative provides a broad overview of the genes and pathways that are altered in this cancer type.

Genetics

Molecular Biology

0

Paper

Save

Integrated genomic analyses of ovarian carcinoma

Abel González-Pérez et al.Jun 28, 2011

A catalogue of molecular aberrations that cause ovarian cancer is critical for developing and deploying therapies that will improve patients’ lives. The Cancer Genome Atlas project has analysed messenger RNA expression, microRNA expression, promoter methylation and DNA copy number in 489 high-grade serous ovarian adenocarcinomas and the DNA sequences of exons from coding genes in 316 of these tumours. Here we report that high-grade serous ovarian cancer is characterized by TP53 mutations in almost all tumours (96%); low prevalence but statistically recurrent somatic mutations in nine further genes including NF1, BRCA1, BRCA2, RB1 and CDK12; 113 significant focal DNA copy number aberrations; and promoter methylation events involving 168 genes. Analyses delineated four ovarian cancer transcriptional subtypes, three microRNA subtypes, four promoter methylation subtypes and a transcriptional signature associated with survival duration, and shed new light on the impact that tumours with BRCA1/2 (BRCA1 or BRCA2) and CCNE1 aberrations have on survival. Pathway analyses suggested that homologous recombination is defective in about half of the tumours analysed, and that NOTCH and FOXM1 signalling are involved in serous ovarian cancer pathophysiology. The Cancer Genome Atlas (TCGA) project reports here its analysis of messenger RNA and microRNA expression, promoter methylation, DNA copy number and exome sequences in 489 high-grade serous ovarian adenocarcinomas. The analyses help establish new tumour subtypes. Among other insights is the finding that while the gene encoding p53 tumour suppressor is mutated in almost all tumours, nine other loci including NF1, BRCA1, BRCA2, RB1 and CDK12 carry recurrent albeit low-prevalence mutations. Homologous recombination is defective in about half of the tumours studied, and Notch and FOXM1 signalling are involved in the pathophysiology.

Genetics

Biochemistry

0

Paper

Save

Initial sequencing and comparative analysis of the mouse genome

R Waterston et al.Dec 1, 2002

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

Genetics

Molecular Biology

4

Paper

Save

The International HapMap Project

John Belmont et al.Dec 1, 2003

The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.

Genetics

Biology

0

Paper

Save

The Genome Sequence of Drosophila melanogaster

Mark Adams et al.Mar 24, 2000

The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the ∼120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes ∼13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

Genetics

Molecular Biology

0

Paper

Save

A second generation human haplotype map of over 3.1 million SNPs

Kelly Frazer et al.Oct 1, 2007

We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10–30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations. The International HapMap Consortium has produced a second-generation version of its remarkable haplotype map of the human genome. The Phase II HapMap charts human genetic variation even more extensively than the original, tripling of the number of genetic markers included. The original HapMap was instrumental in making large-scale genome-wide association studies possible. An indication of how this type of work will be extended with 'HapMap2' is presented in this issue: Sabeti et al. build on previous work detecting signs of positive natural selection on human genes. With many more markers now available, they have discovered three examples of apparent population-specific selection based on geographic area — involving gene pairs linked to Lassa virus in West Africa, skin pigmentation in Europe and hair follicle development in Asia — and they speculate on how these may relate to human biology. A consortium reports the tripling of the number of genetic markers in Phase II of the International HapMap Project. This map of human genetic variation will continue to revolutionize discovery of susceptibility loci in common genetic diseases, and study of genes under selection in humans.

Genetics

Molecular Biology

0

Paper

Save

The Somatic Genomic Landscape of Glioblastoma

Stacey Gabriel et al.Oct 1, 2013

Summary

We describe the landscape of somatic genomic alterations based on multidimensional and comprehensive characterization of more than 500 glioblastoma tumors (GBMs). We identify several novel mutated genes as well as complex rearrangements of signature receptors, including EGFR and PDGFRA. TERT promoter mutations are shown to correlate with elevated mRNA expression, supporting a role in telomerase reactivation. Correlative analyses confirm that the survival advantage of the proneural subtype is conferred by the G-CIMP phenotype, and MGMT DNA methylation may be a predictive biomarker for treatment response only in classical subtype GBM. Integrative analysis of genomic and proteomic profiles challenges the notion of therapeutic inhibition of a pathway as an alternative to inhibition of the target itself. These data will facilitate the discovery of therapeutic and diagnostic target candidates, the validation of research and clinical observations and the generation of unanticipated hypotheses that can advance our molecular understanding of this lethal cancer.

Genetics

Molecular Biology

0

Paper

Save

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes

Adam Siepel et al.Jul 15, 2005

We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes ). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae ), two species of Caenorhabditis , and seven species of Saccharomyces . Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%–8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%–53%), Caenorhabditis elegans (18%–37%), and Saccharaomyces cerevisiae (47%–68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3′ UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.

Genetics

Molecular Biology

0

Paper

Save

Genomic analyses identify molecular subtypes of pancreatic cancer

Peter Allen et al.Feb 23, 2016

Genetics

Oncology

0

Paper

Genetics

2,989

0

Save