ResearchHub | Open Science Community

Global variation in copy number in the human genome

Richard Redon et al.Nov 1, 2006

Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies. Where to next after sequencing the human genome? We want to know how human genomes differ from each other. Last year the International HapMap Project published a map of single nucleotide changes, and now an international consortium has mapped even larger areas of differences, called copy number variants (CNVs). Each CNV involves at least 1,000 base-pair differences between individuals, and they have been linked to both benign and disease-causing changes in the genome. The new map is based on analysis of DNA from 270 individuals. Over 1,400 CNVs were found, covering 12% of the genome. This makes them far more prevalent than was thought, and suggests that unless analysed for directly, these differences could be missed by present strategies used to identify genes mutated in genetic diseases. Last year the first map of single nucleotide changes was published; now an international consortium has mapped even larger areas of differences, called copy number variants. These variants are at least 1,000-base-pair differences between individual people, and have been linked to both benign and disease-causing changes in the human genome.

Genetics

Plant Science

0

Paper

Save

The zebrafish reference genome sequence and its relationship to the human genome

Kerstin Howe et al.Apr 16, 2013

A high-quality sequence assembly of the zebrafish genome reveals the largest gene set of any vertebrate and provides information on key genomic features, and comparison to the human reference genome shows that approximately 70% of human protein-coding genes have at least one clear zebrafish orthologue. The genome of the zebrafish — a key model organism for the study of development and human disease — has now been sequenced and published as a well-annotated reference genome. Zebrafish turns out to have the largest gene set of any vertebrate so far sequenced, and few pseudogenes. Importantly for disease studies, comparison between human and zebrafish sequences reveals that 70% of human genes have at least one obvious zebrafish orthologue. A second paper reports on an ongoing effort to identify and phenotype disruptive mutations in every zebrafish protein-coding gene. Using the reference genome sequence along with high-throughput sequencing and efficient chemical mutagenesis, the project's initial results — covering 38% of all known protein-coding genes — they describe phenotypic consequences of more than 1,000 alleles. The long-term goal is the creation of a knockout allele in every protein-coding gene in the zebrafish genome. All mutant alleles and data are freely available at go.nature.com/en6mos . Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3,4,5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.

Genetics

Molecular Biology

0

Paper

Save

Accurate whole human genome sequencing using reversible terminator chemistry

David Bentley et al.Nov 1, 2008

DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400–800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30× average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications. The power of the latest massively parallel synthetic DNA sequencing technologies is demonstrated in two major collaborations that shed light on the nature of genomic variation with ethnicity. The first describes the genomic characterization of an individual from the Yoruba ethnic group of west Africa. The second reports a personal genome of a Han Chinese, the group comprising 30% of the world's population. These new resources can now be used in conjunction with the Venter, Watson and NIH reference sequences. A separate study looked at genetic ethnicity on the continental scale, based on data from 1,387 individuals from more than 30 European countries. Overall there was little genetic variation between countries, but the differences that do exist correspond closely to the geographic map. Statistical analysis of the genome data places 50% of the individuals within 310 km of their reported origin. As well as its relevance for testing genetic ancestry, this work has implications for evaluating genome-wide association studies that link genes with diseases.

Genetics

Molecular Biology

0

Paper

Save

A DNA damage checkpoint response in telomere-initiated senescence

Fabrizio Fagagna et al.Nov 1, 2003

Genetics

Physiology

0

Paper

Save

Massive Genomic Rearrangement Acquired in a Single Catastrophic Event during Cancer Development

Philip Stephens et al.Jan 1, 2011

Summary

Cancer is driven by somatically acquired point mutations and chromosomal rearrangements, conventionally thought to accumulate gradually over time. Using next-generation sequencing, we characterize a phenomenon, which we term chromothripsis, whereby tens to hundreds of genomic rearrangements occur in a one-off cellular crisis. Rearrangements involving one or a few chromosomes crisscross back and forth across involved regions, generating frequent oscillations between two copy number states. These genomic hallmarks are highly improbable if rearrangements accumulate over time and instead imply that nearly all occur during a single cellular catastrophe. The stamp of chromothripsis can be seen in at least 2%–3% of all cancers, across many subtypes, and is present in ∼25% of bone cancers. We find that one, or indeed more than one, cancer-causing lesion can emerge out of the genomic crisis. This phenomenon has important implications for the origins of genomic remodeling and temporal emergence of cancer.

PaperClip

Genetics

Molecular Biology

0

Paper

Save

Origins and functional impact of copy number variation in the human genome

Donald Conrad et al.Oct 7, 2009

Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs. Copy number variations or CNVs are a common form of genetic variation between individuals, caused by genomic rearrangements, either inherited or due to de novo mutation. A major collaborative effort using tiling oligonucleotide microarrays and HapMap samples has generated a comprehensive working map of 11,700 CNVs in the human genome. About half of these were also genotyped in individuals of different ancestry — European, African or East Asian. Thirty loci with CNVs that are candidates for influencing disease susceptibility were identified. Published online last October, this vast data set is a landmark in terms of completeness and spatial resolution, and as John Armour wrote in News & Views , is likely to stand as a definitive resource for years to come. This resource is the main focus of a new genome-wide association study, from the Wellcome Trust Case Control Consortium, of the links between common CNVs and eight common human diseases. Providing a wealth of technical insights to inform future study design and analysis, the Wellcome study also implies that common CNVs that can be genotyped using existing platforms are unlikely to have a major role in the genetic basis of common diseases. Much genetic variation among humans can be accounted for by structural DNA differences that are greater than 1 kilobase in size. Here, using tiling oligonucleotide arrays and HapMap samples, a map of 11,700 copy number variations (CNVs) bigger than 443 base pairs has been generated. About half of these CNVs were also genotyped in individuals of different ancestry. The results offer insight into the relative prevalence of mechanisms that generate CNVs, their evolution, and their contribution to complex genetic diseases.

Genetics

Plant Science

0

Paper

Save

DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources

Helen Firth et al.Apr 1, 2009

Genetics

Cancer Research

0

Paper

Save

Degenerate oligonucleotide-primed PCR: General amplification of target DNA by a single degenerate primer

H. Telenius et al.Jul 1, 1992

A version of the polymerase chain reaction (PCR), termed degenerate oligonucleotide-primed PCR (DOP-PCR), which employs oligonucleotides of partially degenerate sequence, has been developed for genome mapping studies. This degeneracy, together with a PCR protocol utilizing a low initial annealing temperature, ensures priming from multiple (e.g., ∼106 in human) evenly dispersed sites within a given genome. Furthermore, as efficient amplification is achieved from the genomes of all species tested using the same primer, the method appears to be species-independent. Thus, for the general amplification of target DNA, DOP-PCR has advantages over interspersed repetitive sequence PCR (IRS-PCR), which relies on the appropriate positioning of species-specific repeat elements. In conjunction with chromosome flow sorting, DOP-PCR has been applied to the characterization of abnormal chromosomes and also to the cloning of new markers for specific chromosome regions. DOP-PCR therefore represents a rapid, efficient, and species-independent technique for general DNA amplification.

Genetics

Molecular Biology

0

Paper

Save

Diet and the evolution of human amylase gene copy number variation

George Perry et al.Sep 9, 2007

Starch consumption is a prominent characteristic of agricultural societies and hunter-gatherers in arid environments. In contrast, rainforest and circum-arctic hunter-gatherers and some pastoralists consume much less starch1,2,3. This behavioral variation raises the possibility that different selective pressures have acted on amylase, the enzyme responsible for starch hydrolysis4. We found that copy number of the salivary amylase gene (AMY1) is correlated positively with salivary amylase protein level and that individuals from populations with high-starch diets have, on average, more AMY1 copies than those with traditionally low-starch diets. Comparisons with other loci in a subset of these populations suggest that the extent of AMY1 copy number differentiation is highly unusual. This example of positive selection on a copy number–variable gene is, to our knowledge, one of the first discovered in the human genome. Higher AMY1 copy numbers and protein levels probably improve the digestion of starchy foods and may buffer against the fitness-reducing effects of intestinal disease.

Genetics

Biochemistry

0

Paper

Save

Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome

Jan Korbel et al.Sep 28, 2007

Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.

Genetics

Molecular Biology

0

Paper

Genetics

1,144

0

Save