ResearchHub | Open Science Community

A global reference for human genetic variation

Alexandra Roa et al.Sep 29, 2015

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.

Genetics

Molecular Biology

0

Paper

Save

MET Amplification Leads to Gefitinib Resistance in Lung Cancer by Activating ERBB3 Signaling

Jeffrey Engelman et al.Apr 27, 2007

The epidermal growth factor receptor (EGFR) kinase inhibitors gefitinib and erlotinib are effective treatments for lung cancers with EGFR activating mutations, but these tumors invariably develop drug resistance. Here, we describe a gefitinib-sensitive lung cancer cell line that developed resistance to gefitinib as a result of focal amplification of the MET proto-oncogene. inhibition of MET signaling in these cells restored their sensitivity to gefitinib. MET amplification was detected in 4 of 18 (22%) lung cancer specimens that had developed resistance to gefitinib or erlotinib. We find that amplification of MET causes gefitinib resistance by driving ERBB3 (HER3)–dependent activation of PI3K, a pathway thought to be specific to EGFR/ERBB family receptors. Thus, we propose that MET amplification may promote drug resistance in other ERBB-driven cancers as well.

Genetics

Oncology

0

Paper

Save

Global variation in copy number in the human genome

Richard Redon et al.Nov 1, 2006

Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies. Where to next after sequencing the human genome? We want to know how human genomes differ from each other. Last year the International HapMap Project published a map of single nucleotide changes, and now an international consortium has mapped even larger areas of differences, called copy number variants (CNVs). Each CNV involves at least 1,000 base-pair differences between individuals, and they have been linked to both benign and disease-causing changes in the genome. The new map is based on analysis of DNA from 270 individuals. Over 1,400 CNVs were found, covering 12% of the genome. This makes them far more prevalent than was thought, and suggests that unless analysed for directly, these differences could be missed by present strategies used to identify genes mutated in genetic diseases. Last year the first map of single nucleotide changes was published; now an international consortium has mapped even larger areas of differences, called copy number variants. These variants are at least 1,000-base-pair differences between individual people, and have been linked to both benign and disease-causing changes in the human genome.

Genetics

Plant Science

0

Paper

Save

Recurrent Fusion of TMPRSS2 and ETS Transcription Factor Genes in Prostate Cancer

Scott Tomlins et al.Oct 27, 2005

Recurrent chromosomal rearrangements have not been well characterized in common carcinomas. We used a bioinformatics approach to discover candidate oncogenic chromosomal aberrations on the basis of outlier gene expression. Two ETS transcription factors, ERG and ETV1 , were identified as outliers in prostate cancer. We identified recurrent gene fusions of the 5′ untranslated region of TMPRSS2 to ERG or ETV1 in prostate cancer tissues with outlier expression. By using fluorescence in situ hybridization, we demonstrated that 23 of 29 prostate cancer samples harbor rearrangements in ERG or ETV1 . Cell line experiments suggest that the androgen-responsive promoter elements of TMPRSS2 mediate the overexpression of ETS family members in prostate cancer. These results have implications in the development of carcinomas and the molecular diagnosis and treatment of prostate cancer.

Genetics

Internal Medicine

0

Paper

Save

Integrating common and rare genetic variation in diverse human populations

Fumihiko Takeuchi et al.Aug 31, 2010

Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called ‘HapMap 3’, includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of ≤5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation. The International HapMap Consortium, established to develop a haplotype map of the human genome describing the common patterns of DNA sequence variation, has now reached its third incarnation. HapMap1, published in 2005 (go.nature.com/gJisDm), contained more than a million SNP (single nucleotide polymorphism) genotypes generated in 269 individuals from four geographically diverse populations. Two years later, HapMap2 (go.nature.com/WttNWX) added more than 2.1 million SNPs to the original map in the same 269 individuals. With the aim of providing a resource for the latest wave of genome-wide studies focused on disease linkages, HapMap3 casts the net wider. About 1.6 million common SNPs were genotyped in 1,184 individuals from 11 global populations, and ten 100-kilobase regions were sequenced in 692 of these individuals. Here, the analysis of 'HapMap 3' is reported — a public data set of genomic variants in human populations. The resource integrates common and rare single nucleotide polymorphisms (SNPs) and copy number polymorphisms (CNPs) from 11 global populations, providing insights into population-specific differences among variants. It also demonstrates the feasibility of imputing newly discovered rare SNPs and CNPs.

Genetics

Molecular Biology

0

Paper

Save

Detection of large-scale variation in the human genome

A. Iafrate et al.Aug 1, 2004

We identified 255 loci across the human genome that contain genomic imbalances among unrelated individuals. Twenty-four variants are present in > 10% of the individuals that we examined. Half of these regions overlap with genes, and many coincide with segmental duplications or gaps in the human genome assembly. This previously unappreciated heterogeneity may underlie certain human phenotypic variation and susceptibility to disease and argues for a more dynamic human genome structure.

Genetics

Molecular Biology

0

Paper

Save

An integrated map of structural variation in 2,504 human genomes

Peter Sudmant et al.Sep 29, 2015

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association. The Structural Variation Analysis Group of The 1000 Genomes Project reports an integrated structural variation map based on discovery and genotyping of eight major structural variation classes in 2,504 unrelated individuals from across 26 populations; structural variation is compared within and between populations and its functional impact is quantified. The Structural Variation Analysis Group of The 1000 Genomes Project reports an integrated structural variation map based on discovery and genotyping of eight major structural variation classes in genomes for 2,504 unrelated individuals from across 26 populations. They characterize structural variation within and between populations and quantify its functional effect. The authors further create a phased reference panel that will be valuable for population genetic and disease association studies.

Genetics

Demography

0

Paper

Save

Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma

Adrian Ally et al.Jun 1, 2017

Liver cancer has the second highest worldwide cancer mortality rate and has limited therapeutic options. We analyzed 363 hepatocellular carcinoma (HCC) cases by whole-exome sequencing and DNA copy number analyses, and we analyzed 196 HCC cases by DNA methylation, RNA, miRNA, and proteomic expression also. DNA sequencing and mutation analysis identified significantly mutated genes, including LZTR1, EEF1A1, SF3B1, and SMARCA4. Significant alterations by mutation or downregulation by hypermethylation in genes likely to result in HCC metabolic reprogramming (ALB, APOB, and CPS1) were observed. Integrative molecular HCC subtyping incorporating unsupervised clustering of five data platforms identified three subtypes, one of which was associated with poorer prognosis in three HCC cohorts. Integrated analyses enabled development of a p53 target gene expression signature correlating with poor survival. Potential therapeutic targets for which inhibitors exist include WNT signaling, MDM4, MET, VEGFA, MCL1, IDH1, TERT, and immune checkpoint proteins CTLA-4, PD-1, and PD-L1.

Genetics

Molecular Biology

0

Paper

Save

Origins and functional impact of copy number variation in the human genome

Donald Conrad et al.Oct 7, 2009

Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs. Copy number variations or CNVs are a common form of genetic variation between individuals, caused by genomic rearrangements, either inherited or due to de novo mutation. A major collaborative effort using tiling oligonucleotide microarrays and HapMap samples has generated a comprehensive working map of 11,700 CNVs in the human genome. About half of these were also genotyped in individuals of different ancestry — European, African or East Asian. Thirty loci with CNVs that are candidates for influencing disease susceptibility were identified. Published online last October, this vast data set is a landmark in terms of completeness and spatial resolution, and as John Armour wrote in News & Views , is likely to stand as a definitive resource for years to come. This resource is the main focus of a new genome-wide association study, from the Wellcome Trust Case Control Consortium, of the links between common CNVs and eight common human diseases. Providing a wealth of technical insights to inform future study design and analysis, the Wellcome study also implies that common CNVs that can be genotyped using existing platforms are unlikely to have a major role in the genetic basis of common diseases. Much genetic variation among humans can be accounted for by structural DNA differences that are greater than 1 kilobase in size. Here, using tiling oligonucleotide arrays and HapMap samples, a map of 11,700 copy number variations (CNVs) bigger than 443 base pairs has been generated. About half of these CNVs were also genotyped in individuals of different ancestry. The results offer insight into the relative prevalence of mechanisms that generate CNVs, their evolution, and their contribution to complex genetic diseases.

Genetics

Plant Science

0

Paper

Save

Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma

Levi Garraway et al.Jul 1, 2005

Genetics

Molecular Biology

0

Paper

Genetics

1,446

0

Save