ResearchHub | Open Science Community

BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes

Mosè Manni et al.Jul 25, 2021

Abstract Methods for evaluating the quality of genomic and metagenomic data are essential to aid genome assembly procedures and to correctly interpret the results of subsequent analyses. BUSCO estimates the completeness and redundancy of processed genomic data based on universal single-copy orthologs. Here, we present new functionalities and major improvements of the BUSCO software, as well as the renewal and expansion of the underlying data sets in sync with the OrthoDB v10 release. Among the major novelties, BUSCO now enables phylogenetic placement of the input sequence to automatically select the most appropriate BUSCO data set for the assessment, allowing the analysis of metagenome-assembled genomes of unknown origin. A newly introduced genome workflow increases the efficiency and runtimes especially on large eukaryotic genomes. BUSCO is the only tool capable of assessing both eukaryotic and prokaryotic species, and can be applied to various data types, from genome assemblies and metagenomic bins, to transcriptomes and gene sets.

Genetics

Molecular Biology

0

Paper

Save

BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics

Robert Waterhouse et al.Dec 5, 2017

Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness of genomic data sets in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). The latest software release implements a complete refactoring of the code to make it more flexible and extendable to facilitate high-throughput assessments. The original six lineage assessment data sets have been updated with improved species sampling, 34 new subsets have been built for vertebrates, arthropods, fungi, and prokaryotes that greatly enhance resolution, and data sets are now also available for nematodes, protists, and plants. Here, we present BUSCO v3 with example analyses that highlight the wide-ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.

Genetics

Molecular Biology

1

Paper

Save

BUSCO: Assessing Genome Assembly and Annotation Completeness

Mathieu Seppey et al.Jan 1, 2019

Genomics drives the current progress in molecular biology, generating unprecedented volumes of data. The scientific value of these sequences depends on the ability to evaluate their completeness using a biologically meaningful approach. Here, we describe the use of the BUSCO tool suite to assess the completeness of genomes, gene sets, and transcriptomes, using their gene content as a complementary method to common technical metrics. The chapter introduces the concept of universal single-copy genes, which underlies the BUSCO methodology, covers the basic requirements to set up the tool, and provides guidelines to properly design the analyses, run the assessments, and interpret and utilize the results.

Genetics

History

0

Paper

Save

BUSCO: Assessing Genomic Data Quality and Beyond

Mosè Manni et al.Dec 1, 2021

Evaluation of the quality of genomic "data products" such as genome assemblies or gene sets is of critical importance in order to recognize possible issues and correct them during the generation of new data. It is equally essential to guide subsequent or comparative analyses with existing data, as the correct interpretation of the results necessarily requires knowledge about the quality level and reliability of the inputs. Using datasets of near universal single-copy orthologs derived from OrthoDB, BUSCO can estimate the completeness and redundancy of genomic data by providing biologically meaningful metrics based on expected gene content. These can complement technical metrics such as contiguity measures (e.g., number of contigs/scaffolds, and N50 values). Here, we describe the use of the BUSCO tool suite to assess different data types that can range from genome assemblies of single isolates and assembled transcriptomes and annotated gene sets to metagenome-assembled genomes where the taxonomic origin of the species is unknown. BUSCO is the only tool capable of assessing all these types of sequences from both eukaryotic and prokaryotic species. The protocols detail the various BUSCO running modes and the novel workflows introduced in versions 4 and 5, including the batch analysis on multiple inputs, the auto-lineage workflow to run assessments without specifying a dataset, and a workflow for the evaluation of (large) eukaryotic genomes. The protocols further cover the BUSCO setup, guidelines to interpret the results, and BUSCO "plugin" workflows for performing common operations in genomics using BUSCO results, such as building phylogenomic trees and visualizing syntenies. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Assessing an input sequence with a BUSCO dataset specified manually Basic Protocol 2: Assessing an input sequence with a dataset automatically selected by BUSCO Basic Protocol 3: Assessing multiple inputs Alternate Protocol: Decreasing analysis runtime when assessing a large number of small genomes with BUSCO auto-lineage workflow and Snakemake Support Protocol 1: BUSCO setup Support Protocol 2: Visualizing BUSCO results Support Protocol 3: Building phylogenomic trees.

Genetics

Molecular Biology

0

Paper

Save

OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs

Evgeny Zdobnov et al.Nov 8, 2016

OrthoDB is a comprehensive catalog of orthologs, genes inherited by extant species from a single gene in their last common ancestor. In 2016 OrthoDB reached its 9th release, growing to over 22 million genes from over 5000 species, now adding plants, archaea and viruses. In this update we focused on usability of this fast-growing wealth of data: updating the user and programmatic interfaces to browse and query the data, and further enhancing the already extensive integration of available gene functional annotations. Collating functional annotations from over 100 resources, and enabled us to propose descriptive titles for 87% of ortholog groups. Additionally, OrthoDB continues to provide computed evolutionary annotations and to allow user queries by sequence homology. The OrthoDB resource now enables users to generate publication-quality comparative genomics charts, as well as to upload, analyze and interactively explore their own private data. OrthoDB is available from http://orthodb.org.

Genetics

Molecular Biology

0

Paper

Save

OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity

Dmitry Kuznetsov et al.Oct 26, 2022

Abstract OrthoDB provides evolutionary and functional annotations of genes in a diverse sampling of eukaryotes, prokaryotes, and viruses. Genomics continues to accelerate our exploration of gene diversity and orthology is the most precise way of bridging gene functional knowledge with the rapidly expanding universe of genomic sequences. OrthoDB samples the most diverse organisms with the best quality genomics data to provide the leading coverage of species diversity. This update of the underlying data to over 18 000 prokaryotes and almost 2000 eukaryotes with over 100 million genes propels the coverage to another level. This achievement also demonstrates the scalability of the underlying OrthoLoger software for delineation of orthologs, freely available from https://orthologer.ezlab.org. In addition to the ab-initio computations of gene orthology used for the OrthoDB release, the OrthoLoger software allows mapping of novel gene sets to precomputed orthologs and thereby links to their annotations. The LEMMI-style benchmarking of OrthoLoger ensures its state-of-the-art performance and is available from https://lemortho.ezlab.org. The OrthoDB web interface has been further developed to include a pairwise orthology view from any gene to any other sampled species. OrthoDB-computed evolutionary annotations as well as extensively collated functional annotations can be accessed via REST API or SPARQL/RDF, downloaded or browsed online from https://www.orthodb.org.

Genetics

Ecology

0

Paper

Save

The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals

Frederic Bastian et al.May 29, 2020

ABSTRACT Bgee is a database to retrieve and compare gene expression patterns in multiple animal species, produced by integrating multiple data types (RNA-Seq, Affymetrix, in situ hybridization, and EST data). It is based exclusively on curated healthy wild-type expression data (e.g., no gene knock-out, no treatment, no disease), to provide a comparable reference of normal gene expression. Curation includes very large datasets such as GTEx (re-annotation of samples as “healthy” or not) as well as many small ones. Data are integrated and made comparable between species thanks to consistent data annotation and processing, and to calls of presence/absence of expression, along with expression scores. As a result, Bgee is capable of detecting the conditions of expression of any single gene, accommodating any data type and species. Bgee provides several tools for analyses, allowing, e.g., automated comparisons of gene expression patterns within and between species, retrieval of the prefered conditions of expression of any gene, or enrichment analyses of conditions with expression of sets of genes. Bgee release 14.1 includes 29 animal species, and is available at https://bgee.org/ and through its Bioconductor R package BgeeDB.

Genetics

Molecular Biology

50

Paper

Save

Genomic signatures accompanying the dietary shift to phytophagy in polyphagan beetles

Mathieu Seppey et al.Aug 26, 2018

The diversity and evolutionary success of beetles (Coleoptera) are proposed to have arisen from millions of years of specialized trophic interactions with land plants. In particular, ingestion of toxic plant allelochemicals may impose selective pressures that drive genomic diversification and speciation in phytophagous beetles. However, evidence of changes in beetle gene repertoires driven by these interactions remains largely anecdotal and without explicit hypothesis testing. To address this, we explored the genomic consequences of beetle-plant trophic interactions by performing comparative gene family analyses across 18 species representing the two most speciose beetle suborders. By contrasting gene content of species from the phytophagous-rich suborder Polyphaga with those of the mainly predatory Adephaga, we identified families of detoxification enzymes that underwent adaptive expansions in Polyphaga. These genomic signatures that accompany the dietary shift to phytophagy in polyphagous beetles suggest a key role for interactions with plant chemical defenses in driving beetle diversification.

Genetics

Ecology

0

Paper

Save

LEMMI: A continuous benchmarking platform for metagenomics classifiers

Mathieu Seppey et al.Dec 28, 2018

Studies of microbiomes are booming, as well as the diversity of computational tools to make sense out of the sequencing data and the volumes of accumulated microbial genotypes. LEMMI ( ) is a novel concept of a benchmarking platform of computational tools for metagenome composition assessments that introduces: a continuous integration of tools, their multi-objective ranking, and an effective distribution through software containers. Here, we detail the workflow and discuss the evaluation of some recently released methods. We see this platform eventually as a community-driven effort: where method developers can showcase novel approaches and get unbiased benchmarks for publications, while users can make informed choices and obtain standardized and easy-to-use tools.

Biochemistry

Microbiology

0

Paper

Save

BUSCO applications from quality assessments to gene prediction and phylogenomics

Robert Waterhouse et al.Aug 17, 2017

Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). Now in its third release, BUSCO utilities extend beyond quality control to applications in comparative genomics, gene predictor training, metagenomics, and phylogenomics.

Genetics

Philosophy

0

Paper

Genetics

Philosophy

0

Save