ResearchHub | Open Science Community

The zebrafish reference genome sequence and its relationship to the human genome

Kerstin Howe et al.Apr 16, 2013

A high-quality sequence assembly of the zebrafish genome reveals the largest gene set of any vertebrate and provides information on key genomic features, and comparison to the human reference genome shows that approximately 70% of human protein-coding genes have at least one clear zebrafish orthologue. The genome of the zebrafish — a key model organism for the study of development and human disease — has now been sequenced and published as a well-annotated reference genome. Zebrafish turns out to have the largest gene set of any vertebrate so far sequenced, and few pseudogenes. Importantly for disease studies, comparison between human and zebrafish sequences reveals that 70% of human genes have at least one obvious zebrafish orthologue. A second paper reports on an ongoing effort to identify and phenotype disruptive mutations in every zebrafish protein-coding gene. Using the reference genome sequence along with high-throughput sequencing and efficient chemical mutagenesis, the project's initial results — covering 38% of all known protein-coding genes — they describe phenotypic consequences of more than 1,000 alleles. The long-term goal is the creation of a knockout allele in every protein-coding gene in the zebrafish genome. All mutant alleles and data are freely available at go.nature.com/en6mos . Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3,4,5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.

Genetics

Molecular Biology

0

Paper

Save

A physical, genetic and functional sequence assembly of the barley genome

Klaus Mayer et al.Oct 16, 2012

Barley (Hordeum vulgare L.) is among the world’s earliest domesticated and most important crop plants. It is diploid with a large haploid genome of 5.1 gigabases (Gb). Here we present an integrated and ordered physical, genetic and functional sequence resource that describes the barley gene-space in a structured whole-genome context. We developed a physical map of 4.98 Gb, with more than 3.90 Gb anchored to a high-resolution genetic map. Projecting a deep whole-genome shotgun assembly, complementary DNA and deep RNA sequence data onto this framework supports 79,379 transcript clusters, including 26,159 ‘high-confidence’ genes with homology support from other plant genomes. Abundant alternative splicing, premature termination codons and novel transcriptionally active regions suggest that post-transcriptional processing forms an important regulatory layer. Survey sequences from diverse accessions reveal a landscape of extensive single-nucleotide variation. Our data provide a platform for both genome-assisted research and enabling contemporary crop improvement. An integrated high-resolution genetic, physical and shotgun sequence assembly of the barley genome, one of the earliest domesticated and most important crops, is described; it will provide a platform for genome-assisted research and future crop improvement. Two groups in this issue report the compilation and analysis of the genome sequences of major cereal crops — bread wheat and barley — providing important resources for future crop improvement. Bread wheat accounts for one-fifth of the calories consumed by humankind. It has a very large and complex hexaploid genome of 17 Gigabases. Michael Bevan and colleagues have analysed the genome using 454 pyrosequencing and compared it with diploid ancestral and progenitor genomes. The authors discovered significant loss of gene family members upon polyploidization and domestication, and expansion of gene classes that may be associated with crop productivity. Barley is one of the earliest domesticated plant crops. Although diploid, it has a very large genome of 5.1 Gigabases. Nils Stein and colleagues describe a physical map anchored to a high-resolution genetic map, on top of which they have overlaid a deep whole-genome shotgun assembly, cDNA and RNA-seq data to provide the first in-depth genome-wide survey of the barley genome.

Genetics

Paleontology

0

Paper

Save

A chromosome conformation capture ordered sequence of the barley genome

Martin Mascher et al.Apr 1, 2017

Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion. The International Barley Genome Sequencing Consortium reports sequencing and assembly of a reference genome for barley, Hordeum vulgare. Triticeae grasses, which include barley, wheat and rye, are widely cultivated plants with particularly complex genomes and evolutionary histories. Sequencing of the barley genome has been particularly challenging owing to its large size and particular genomic features, such as an abundance of repetitive elements. Nils Stein and colleagues of the International Barley Genome Sequencing Consortium report sequencing and assembly of a reference genome for barley (Hordeumvulgare L). They use a combined approach of hierarchical shotgun sequencing of bacterial artificial chromosomes, genome mapping on nanochannel arrays and chromosome-scale scaffolding with Hi-C sequencing. This brings the first comprehensive, completely ordered assembly of the pericentromeric regions of a Triticeae genome. The authors also sequenced and examined genetic diversity in the exomes of 96 European elite barley lines with a spring or winter growth habit, and highlight the utility of this resource for cereal genomics and breeding programs.

Genetics

Molecular Biology

0

Paper

Save

Ensembl 2007

Tim Hubbard et al.Dec 6, 2006

The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of chordate genome sequences. Over the past year the number of genomes available from Ensembl has increased from 15 to 33, with the addition of sites for the mammalian genomes of elephant, rabbit, armadillo, tenrec, platypus, pig, cat, bush baby, common shrew, microbat and european hedgehog; the fish genomes of stickleback and medaka and the second example of the genomes of the sea squirt (Ciona savignyi) and the mosquito (Aedes aegypti). Some of the major features added during the year include the first complete gene sets for genomes with low-sequence coverage, the introduction of new strain variation data and the introduction of new orthology/paralog annotations based on gene trees.

Genetics

Molecular Biology

0

Paper

Save

Assemblathon 1: A competitive assessment of de novo short read assembly methods

Dent Earl et al.Sep 16, 2011

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/ .

Genetics

Ecology

0

Paper

Save

Conservation and divergence of gene families encoding components of innate immune response systems in zebrafish

Cornelia Stein et al.Nov 27, 2007

The zebrafish has become a widely used model to study disease resistance and immunity. Although the genes encoding many components of immune signaling pathways have been found in teleost fish, it is not clear whether all components are present or whether the complexity of the signaling mechanisms employed by mammals is similar in fish. We searched the genomes of the zebrafish Danio rerio and two pufferfish for genes encoding components of the Toll-like receptor and interferon signaling pathways, the NLR (NACHT-domain and leucine rich repeat containing) protein family, and related proteins. We find that most of the components known in mammals are also present in fish, with clearly recognizable orthologous relationships. The class II cytokines and their receptors have diverged extensively, obscuring orthologies, but the number of receptors is similar in all species analyzed. In the family of the NLR proteins, the canonical members are conserved. We also found a conserved NACHT-domain protein with WD40 repeats that had previously not been described in mammals. Additionally, we have identified in each of the three fish a large species-specific subgroup of NLR proteins that contain a novel amino-terminal domain that is not found in mammalian genomes. The main innate immune signaling pathways are conserved in mammals and teleost fish. Whereas the components that act downstream of the receptors are highly conserved, with orthologous sets of genes in mammals and teleosts, components that are known or assumed to interact with pathogens are more divergent and have undergone lineage-specific expansions.

Genetics

Immunology

0

Paper

Save

Ensembl 2008

Paul Flicek et al.Nov 14, 2007

The Ensembl project ( http://www.ensembl.org ) is a comprehensive genome information system featuring an integrated set of genome annotation, databases and other information for chordate and selected model organism and disease vector genomes. As of release 47 (October 2007), Ensembl fully supports 35 species, with preliminary support for six additional species. New species in the past year include platypus and horse. Major additions and improvements to Ensembl since our previous report include extensive support for functional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein–DNA interactions and the Ensembl regulatory build; support for customization of the Ensembl web interface through the addition of user accounts and user groups; and increased support for genome resequencing. We have also introduced new comparative genomics-based data mining options and report on the continued development of our software infrastructure.

Genetics

Molecular Biology

0

Paper

Save

gEVAL — a web-based browser for evaluating genome assemblies

William Chow et al.Apr 7, 2016

Abstract Motivation: For most research approaches, genome analyses are dependent on the existence of a high quality genome reference assembly. However, the local accuracy of an assembly remains difficult to assess and improve. The gEVAL browser allows the user to interrogate an assembly in any region of the genome by comparing it to different datasets and evaluating the concordance. These analyses include: a wide variety of sequence alignments, comparative analyses of multiple genome assemblies, and consistency with optical and other physical maps. gEVAL highlights allelic variations, regions of low complexity, abnormal coverage, and potential sequence and assembly errors, and offers strategies for improvement. Although gEVAL focuses primarily on sequence integrity, it can also display arbitrary annotation including from Ensembl or TrackHub sources. We provide gEVAL web sites for many human, mouse, zebrafish and chicken assemblies to support the Genome Reference Consortium, and gEVAL is also downloadable to enable its use for any organism and assembly. Availability and Implementation: Web Browser: http://geval.sanger.ac.uk, Plugin: http://wchow.github.io/wtsi-geval-plugin. Contact: kj2@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Genetics

Molecular Biology

0

Paper

Save

Ensembl 2005

Tim Hubbard et al.Dec 17, 2004

The Ensembl ( http://www.ensembl.org/ ) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased by 7 to 16, with the addition of the six vertebrate genomes of chimpanzee, dog, cow, chicken, tetraodon and frog and the insect genome of honeybee. The majority have been annotated automatically using the Ensembl gene build system, showing its flexibility to reliably annotate a wide variety of genomes. With the increased number of vertebrate genomes, the comparative analysis provided to users has been greatly improved, with new website interfaces allowing annotation of different genomes to be directly compared. The Ensembl software system is being increasingly widely reused in different projects showing the benefits of a completely open approach to software development and distribution.

Genetics

Molecular Biology

0

Paper

Save

The European Genome-phenome Archive of human data consented for biomedical research

Ilkka Lappalainen et al.Jun 26, 2015

Paul Flicek and colleagues provide an update on the European Genome-phenome Archive (EGA), a service of the European Bioinformatics Institute (EMBL-EBI) and the Center for Genome Regulation (CRG). The authors describe the EGA policies and infrastructure, how access decisions are made, methods for data submission and future plans for expansion of this database. The European Genome-phenome Archive (EGA) is a permanent archive that promotes the distribution and sharing of genetic and phenotypic data consented for specific approved uses but not fully open, public distribution. The EGA follows strict protocols for information management, data storage, security and dissemination. Authorized access to the data is managed in partnership with the data-providing organizations. The EGA includes major reference data collections for human genetics research.

Genetics

Law

0

Paper

Genetics

331

0

Save