ResearchHub | Open Science Community

Mutational heterogeneity in cancer and the search for new cancer-associated genes

Michael Lawrence et al.Jun 16, 2013

As the sample size in cancer genome studies increases, the list of genes identified as significantly mutated is likely to include more false positives; here, this problem is identified as stemming largely from mutation heterogeneity, and a new analytical methodology designed to overcome this problem is described. Cancer genomic approaches have identified scores of genes responsible for the initiation and progression of cancer. But as the sample sizes increase, the list of putatively significant genes identified by current analytical methods continues to grow and is likely to include many false positives. This study shows that this situation stems largely from mutational heterogeneity and presents a novel methodology, MutSigCV, that overcomes the problem by incorporating mutational heterogeneity into the analysis. Application of MutSigCV to more than 3,000 tumour samples from 27 different tumour types shows that mutation frequencies vary more than 1,000-fold between extreme samples both between and within tumour types. And when applied to a data set on lung cancer, MutSigCV reduced the list of significantly mutated genes from 450 to a more manageable 11, most of them previously reported to be mutated in squamous cell lung cancer. Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer1,2,3,4,5,6,7,8,9. These studies involve the sequencing of matched tumour–normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour–normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.

Genetics

Molecular Biology

0

Paper

Save

A second generation human haplotype map of over 3.1 million SNPs

Kelly Frazer et al.Oct 1, 2007

+97

D

K

We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10–30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations. The International HapMap Consortium has produced a second-generation version of its remarkable haplotype map of the human genome. The Phase II HapMap charts human genetic variation even more extensively than the original, tripling of the number of genetic markers included. The original HapMap was instrumental in making large-scale genome-wide association studies possible. An indication of how this type of work will be extended with 'HapMap2' is presented in this issue: Sabeti et al. build on previous work detecting signs of positive natural selection on human genes. With many more markers now available, they have discovered three examples of apparent population-specific selection based on geographic area — involving gene pairs linked to Lassa virus in West Africa, skin pigmentation in Europe and hair follicle development in Asia — and they speculate on how these may relate to human biology. A consortium reports the tripling of the number of genetic markers in Phase II of the International HapMap Project. This map of human genetic variation will continue to revolutionize discovery of susceptibility loci in common genetic diseases, and study of genes under selection in humans.

Genetics

Molecular Biology

0

Paper

Save

Integrating common and rare genetic variation in diverse human populations

David Green et al.Aug 31, 2010

Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called ‘HapMap 3’, includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of ≤5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation. The International HapMap Consortium, established to develop a haplotype map of the human genome describing the common patterns of DNA sequence variation, has now reached its third incarnation. HapMap1, published in 2005 (go.nature.com/gJisDm), contained more than a million SNP (single nucleotide polymorphism) genotypes generated in 269 individuals from four geographically diverse populations. Two years later, HapMap2 (go.nature.com/WttNWX) added more than 2.1 million SNPs to the original map in the same 269 individuals. With the aim of providing a resource for the latest wave of genome-wide studies focused on disease linkages, HapMap3 casts the net wider. About 1.6 million common SNPs were genotyped in 1,184 individuals from 11 global populations, and ten 100-kilobase regions were sequenced in 692 of these individuals. Here, the analysis of 'HapMap 3' is reported — a public data set of genomic variants in human populations. The resource integrates common and rare single nucleotide polymorphisms (SNPs) and copy number polymorphisms (CNPs) from 11 global populations, providing insights into population-specific differences among variants. It also demonstrates the feasibility of imputing newly discovered rare SNPs and CNPs.

Genetics

Molecular Biology

0

Paper

Save

Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels

Richa Saxena et al.Apr 27, 2007

New strategies for prevention and treatment of type 2 diabetes (T2D) require improved insight into disease etiology. We analyzed 386,731 common single-nucleotide polymorphisms (SNPs) in 1464 patients with T2D and 1467 matched controls, each characterized for measures of glucose metabolism, lipids, obesity, and blood pressure. With collaborators (FUSION and WTCCC/UKT2D), we identified and confirmed three loci associated with T2D—in a noncoding region near CDKN2A and CDKN2B , in an intron of IGF2BP2 , and an intron of CDKAL1 —and replicated associations near HHEX and in SLC30A8 found by a recent whole-genome association study. We identified and confirmed association of a SNP in an intron of glucokinase regulatory protein (GCKR) with serum triglycerides. The discovery of associated variants in unsuspected genes and outside coding regions illustrates the ability of genome-wide association studies to provide potentially important clues to the pathogenesis of common diseases.

Genetics

Molecular Biology

0

Paper

Save

The genomic complexity of primary human prostate cancer

Michael Berger et al.Feb 1, 2011

+38

F

M

Prostate cancer is the second most common cause of male cancer deaths in the United States. However, the full range of prostate cancer genomic alterations is incompletely characterized. Here we present the complete sequence of seven primary human prostate cancers and their paired normal counterparts. Several tumours contained complex chains of balanced (that is, 'copy-neutral') rearrangements that occurred within or adjacent to known cancer genes. Rearrangement breakpoints were enriched near open chromatin, androgen receptor and ERG DNA binding sites in the setting of the ETS gene fusion TMPRSS2-ERG, but inversely correlated with these regions in tumours lacking ETS fusions. This observation suggests a link between chromatin or transcriptional regulation and the genesis of genomic aberrations. Three tumours contained rearrangements that disrupted CADM2, and four harboured events disrupting either PTEN (unbalanced events), a prostate tumour suppressor, or MAGI2 (balanced events), a PTEN interacting protein not previously implicated in prostate tumorigenesis. Thus, genomic rearrangements may arise from transcriptional or chromatin aberrancies and engage prostate tumorigenic mechanisms.

Genetics

Internal Medicine

0

Paper

Save

Sequence analysis of mutations and translocations across breast cancer subtypes

Shantanu Banerji et al.Jun 19, 2012

This paper reports one of the largest breast cancer whole-exome and whole-genome sequencing efforts so far, identifying previously unknown recurrent mutations in CBFB, deletions of RUNX1 and recurrent MAGI1–AKT3 fusion; the fusion suggests that the use of ATP-competitive AKT inhibitors should be evaluated in clinical trials. This paper reports one of the largest whole-exome sequencing efforts in human breast cancers so far, complemented by whole-genome sequences of 22 breast cancer/normal pairs. The authors analysed diverse subtypes from patients in Mexico and Vietnam and identified recurrent mutations in the CBFB transcription factor gene and deletions of its partner RUNX1, as well as a recurrent MAGI3–AKT3 fusion enriched in triple-negative breast cancers (those lacking oestrogen and progesterone receptors and ERBB2 expression). The fusion leads to constitutive activation of AKT kinase, which can be counteracted by treatment with a small-molecule inhibitor. Breast carcinoma is the leading cause of cancer-related mortality in women worldwide, with an estimated 1.38 million new cases and 458,000 deaths in 2008 alone1. This malignancy represents a heterogeneous group of tumours with characteristic molecular features, prognosis and responses to available therapy2,3,4. Recurrent somatic alterations in breast cancer have been described, including mutations and copy number alterations, notably ERBB2 amplifications, the first successful therapy target defined by a genomic aberration5. Previous DNA sequencing studies of breast cancer genomes have revealed additional candidate mutations and gene rearrangements6,7,8,9,10. Here we report the whole-exome sequences of DNA from 103 human breast cancers of diverse subtypes from patients in Mexico and Vietnam compared to matched-normal DNA, together with whole-genome sequences of 22 breast cancer/normal pairs. Beyond confirming recurrent somatic mutations in PIK3CA11, TP536, AKT112, GATA313 and MAP3K110, we discovered recurrent mutations in the CBFB transcription factor gene and deletions of its partner RUNX1. Furthermore, we have identified a recurrent MAGI3–AKT3 fusion enriched in triple-negative breast cancer lacking oestrogen and progesterone receptors and ERBB2 expression. The MAGI3–AKT3 fusion leads to constitutive activation of AKT kinase, which is abolished by treatment with an ATP-competitive AKT small-molecule inhibitor.

Genetics

Molecular Biology

0

Paper

Save

Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants

Sekar Kathiresan et al.Feb 8, 2009

The Myocardial Infarction Genetics Consortium reports results of a genome-wide association study of early-onset myocardial infarction. The study analyzed common SNPs, common CNVs and rare CNVs and identified SNP alleles at three new loci associated with disease risk. We conducted a genome-wide association study testing single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) for association with early-onset myocardial infarction in 2,967 cases and 3,075 controls. We carried out replication in an independent sample with an effective sample size of up to 19,492. SNPs at nine loci reached genome-wide significance: three are newly identified (21q22 near MRPS6-SLC5A3-KCNE2, 6p24 in PHACTR1 and 2q33 in WDR12) and six replicated prior observations1,2,3,4 (9p21, 1p13 near CELSR2-PSRC1-SORT1, 10q11 near CXCL12, 1q41 in MIA3, 19p13 near LDLR and 1p32 near PCSK9). We tested 554 common copy number polymorphisms (>1% allele frequency) and none met the pre-specified threshold for replication (P < 10−3). We identified 8,065 rare CNVs but did not detect a greater CNV burden in cases compared to controls, in genes compared to the genome as a whole, or at any individual locus. SNPs at nine loci were reproducibly associated with myocardial infarction, but tests of common and rare CNVs failed to identify additional associations with myocardial infarction risk.

Genetics

Molecular Biology

0

Paper

Save

Integrated detection and population-genetic analysis of SNPs and copy number variation

Steven McCarroll et al.Sep 7, 2008

0

Paper

Save

Melanoma genome sequencing reveals frequent PREX2 mutations

Michael Berger et al.May 1, 2012

Whole-genome sequencing of 25 metastatic melanomas and matched germline DNA in humans reveals that the highest mutation load is associated with chronic sun exposure, and that the PREX2 gene is mutated in approximately 14 per cent of cases Melanoma is a highly metastatic cancer, characterized by high lethality and rapid development of resistance to treatment. Whole-genome sequencing of 25 metastatic melanomas and matched germline DNA reveals that the mutation rate varies widely, with the highest mutation load associated with chronic exposure to sunlight. PREX2 — a PTEN-interacting protein previously implicated in breast cancer — is mutated in approximately 14% of cases. Although the precise role of PREX2 in melanoma remains to be elucidated, ectopic expression of its mutant form accelerates tumour formation of immortalized human melanocytes in vivo. Melanoma is notable for its metastatic propensity, lethality in the advanced setting and association with ultraviolet exposure early in life1. To obtain a comprehensive genomic view of melanoma in humans, we sequenced the genomes of 25 metastatic melanomas and matched germline DNA. A wide range of point mutation rates was observed: lowest in melanomas whose primaries arose on non-ultraviolet-exposed hairless skin of the extremities (3 and 14 per megabase (Mb) of genome), intermediate in those originating from hair-bearing skin of the trunk (5–55 per Mb), and highest in a patient with a documented history of chronic sun exposure (111 per Mb). Analysis of whole-genome sequence data identified PREX2 (phosphatidylinositol-3,4,5-trisphosphate-dependent Rac exchange factor 2)—a PTEN-interacting protein and negative regulator of PTEN in breast cancer2—as a significantly mutated gene with a mutation frequency of approximately 14% in an independent extension cohort of 107 human melanomas. PREX2 mutations are biologically relevant, as ectopic expression of mutant PREX2 accelerated tumour formation of immortalized human melanocytes in vivo. Thus, whole-genome sequencing of human melanoma tumours revealed genomic evidence of ultraviolet pathogenesis and discovered a new recurrently mutated gene in melanoma.

Genetics

Oncology

0

Paper

Save

Genetic variants near TNFAIP3 on 6q23 are associated with systemic lupus erythematosus

Robert Graham et al.Aug 1, 2008

Patrick Gaffney and colleagues report results of a genome-wide association study for systemic lupus erythematosus (SLE), identifying variants in the TNFAIP3 region on 6q23 that are strongly associated with the disease. In a related study, Lindsey Criswell and colleagues report a similar association between variants near TNFAIP3 and SLE. The same region on 6q23 has recently been associated with rheumatoid arthritis, but only a subset of risk alleles in this region seem to be common to both diseases. Systemic lupus erythematosus (SLE) is an autoimmune disease influenced by genetic and environmental factors. We carried out a genome-wide association scan and replication study and found an association between SLE and a variant in TNFAIP3 (rs5029939, meta-analysis P = 2.89 × 10−12, OR = 2.29). We also found evidence of two independent signals near TNFAIP3 associated with SLE, including one previously associated with rheumatoid arthritis (RA). These results establish that variants near TNFAIP3 contribute to differential risk of SLE and RA.

Genetics

Immunology

0

Paper

Genetics

569

0

Save