ResearchHub | Open Science Community

The Gene Ontology (GO) database and informatics resource

Midori Harris et al.Dec 17, 2003

The Gene Ontology (GO) project (http://www. geneontology.org/) provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences. Many model organism databases and genome annotation groups use the GO and contribute their annotation sets to the GO resource. The GO database integrates the vocabularies and contributed annotations and provides full access to this information in several formats. Members of the GO Consortium continually work collectively, involving outside experts as needed, to expand and update the GO vocabularies. The GO Web resource also provides access to extensive documentation about the GO project and links to applications that use GO data for functional analyses.

Genetics

Philosophy

0

Paper

Save

The genome of woodland strawberry (Fragaria vesca)

Vladimir Shulaev et al.Dec 26, 2010

The International Strawberry Sequencing Consortium reports the draft genome of the woodland strawberry (Fragaria vesca). The genome of this diploid species should serve as a reference genome for the Fragaria genus, as the cultivated strawberry (Fragaria × ananassa) is an octoploid where F. vesca is predicted to be a subgenome donor. The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted.

Genetics

Molecular Biology

0

Paper

Save

The genome of Eucalyptus grandis

Alexander Myburg et al.Jun 11, 2014

Eucalypts are the world’s most widely planted hardwood trees. Their outstanding diversity, adaptability and growth have made them a global renewable resource of fibre and energy. We sequenced and assembled >94% of the 640-megabase genome of Eucalyptus grandis. Of 36,376 predicted protein-coding genes, 34% occur in tandem duplications, the largest proportion thus far in plant genomes. Eucalyptus also shows the highest diversity of genes for specialized metabolites such as terpenes that act as chemical defence and provide unique pharmaceutical oils. Genome sequencing of the E. grandis sister species E. globulus and a set of inbred E. grandis tree genomes reveals dynamic genome evolution and hotspots of inbreeding depression. The E. grandis genome is the first reference for the eudicot order Myrtales and is placed here sister to the eurosids. This resource expands our understanding of the unique biology of large woody perennials and provides a powerful tool to accelerate comparative biology, breeding and biotechnology. The Eucalyptus grandis genome has been sequenced, revealing the greatest number of tandem duplications of any plant genome sequenced so far, and the highest diversity of genes for specialized metabolites that act as chemical defence and provide unique pharmaceutical oils; genome sequencing of the sister species E. globulus and a set of inbred E. grandis tree genomes reveals dynamic genome evolution and hotspots of inbreeding depression. Fast-growing Eucalyptus trees form the basis of an international pulp, paper and chemical cellulose industry and they are also seen as potential biomass feedstocks for bioenergy and biomaterials. The genome of Eucalyptus grandis has now been sequenced. It contains the greatest number of tandem duplications so far found in a plant genome, as well as the highest diversity of genes for specialized metabolites that act as chemical defence and provide unique pharmaceutical oils. Comparison with the sister species E. globulus and with other E. grandis lines reveals dynamic genome evolution and hotspots of inbreeding depression. The availability of comprehensive genomic data will be of use in work on accelerating breeding cycles for productivity and wood quality and developing eucalypt strains suited to a variety of habitats.

Genetics

Molecular Biology

0

Paper

Save

Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants

Robert Petryszak et al.Oct 19, 2015

Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developmental stages, diseases and other conditions. It consists of selected microarray and RNA-sequencing studies from ArrayExpress, which have been manually curated, annotated with ontology terms, checked for high quality and processed using standardised analysis methods. Since the last update, Atlas has grown seven-fold (1572 studies as of August 2015), and incorporates baseline expression profiles of tissues from Human Protein Atlas, GTEx and FANTOM5, and of cancer cell lines from ENCODE, CCLE and Genentech projects. Plant studies constitute a quarter of Atlas data. For genes of interest, the user can view baseline expression in tissues, and differential expression for biologically meaningful pairwise comparisons-estimated using consistent methodology across all of Atlas. Our first proteomics study in human tissues is now displayed alongside transcriptomics data in the same tissues. Novel analyses and visualisations include: 'enrichment' in each differential comparison of GO terms, Reactome, Plant Reactome pathways and InterPro domains; hierarchical clustering (by baseline expression) of most variable genes and experimental conditions; and, for a given gene-condition, distribution of baseline expression across biological replicates.

Genetics

Paleontology

0

Paper

Save

Ensembl Genomes 2020—enabling non-vertebrate genomic research

Kevin Howe et al.Oct 2, 2019

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.

Genetics

Paleontology

0

Paper

Save

Expression Atlas: gene and protein expression across multiple studies and organisms

Irene Papatheodorou et al.Nov 6, 2017

Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions.

Genetics

Paleontology

0

Paper

Save

Ensembl Genomes 2022: an expanding genome resource for non-vertebrates

Andy Yates et al.Nov 10, 2021

Abstract Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, annotation, variation, transcriptomic data and comparative analysis. Here, we present our largest increase in plant, metazoan and fungal genomes since the project's inception creating one of the world's most comprehensive genomic resources and describe our efforts to reduce genome redundancy in our Bacteria portal. We detail our new efforts in gene annotation, our emerging support for pangenome analysis, our efforts to accelerate data dissemination through the Ensembl Rapid Release resource and our new AlphaFold visualization. Finally, we present details of our future plans including updates on our integration with Ensembl, and how we plan to improve our support for the microbial research community. Software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license). Data updates are synchronised with Ensembl's release cycle.

Genetics

Ecology

0

Paper

Save

Nomenclature report on rice WRKY's - Conflict regarding gene names and its solution

Qingxi Shen et al.Feb 27, 2012

Abstract Background Since whole genome sequences of rice were made publically accessible, the number of articles onnew rice genes has increased remarkably. The Committee on Gene Symbolization, Nomenclature and Linkage(CGSNL) of the Rice Genetics Cooperative published the gene nomenclature system for rice and encouragedresearchers to follow the rules before publishing their results. The CGSNL provides an on-line registration systemfor newly identified rice genes to prevent conflicts and/or duplication of gene name in journal articles. Findings Recently, the CGSNL surveyed genes in the rice WRKY family in published journal articles and foundseveral duplicated gene names. Conclusions To discuss and resolve inconsistencies in WRKY gene nomenclature, the rice WRKY working groupwas established and redefined the nomenclature. This report announces the conclusion.

Genetics

Molecular Biology

0

Paper

Save

Finding Our Way through Phenotypes

Andrew Deans et al.Jan 6, 2015

Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.

Genetics

Ecology

0

Paper

Save

A cost-effective maize ear phenotyping platform enables rapid categorization and quantification of kernels

Cedar Warman et al.Jul 12, 2020

Abstract High-throughput phenotyping systems are powerful, dramatically changing our ability to document, measure, and detect biological phenomena. Here, we describe a cost-effective combination of a custom-built imaging platform and deep-learning-based computer vision pipeline. A minimal version of the maize ear scanner was built with low-cost and readily available parts. The scanner rotates a maize ear while a cellphone or digital camera captures a video of the surface of the ear. Videos are then digitally flattened into two-dimensional ear projections. Segregating GFP and anthocyanin kernel phenotype are clearly distinguishable in ear projections, and can be manually annotated using image analysis software. Increased throughput was attained by designing and implementing an automated kernel counting system using transfer learning and a deep learning object detection model. The computer vision model was able to rapidly assess over 390,000 kernels, identifying male-specific transmission defects across a wide range of GFP-marked mutant alleles. This includes a previously undescribed defect putatively associated with mutation of Zm00001d002824, a gene predicted to encode a vacuolar processing enzyme (VPE). We show that by using this system, the quantification of transmission data and other ear phenotypes can be accelerated and scaled to generate large datasets for robust analyses. One sentence summary A maize ear phenotyping system built from commonly available parts creates images of the surface of ears and identifies kernel phenotypes with a deep-learning-based computer vision pipeline.

Genetics

Artificial Intelligence

0

Paper

Genetics

4

0

Save