ResearchHub | Open Science Community

KBase: The United States Department of Energy Systems Biology Knowledgebase

Adam Arkin et al.Jul 6, 2018

To the Editor: Over the past two decades, the scale and complexity of genomics technologies and data have advanced from sequencing genomes of a few organisms to generating metagenomes, genome variation, gene expression, metabolites, and phenotype data for thousands of organisms and their communities.A major challenge in this data-rich age of biology is integrating heterogeneous and distributed data into predictive models of biological function, ranging from a single gene to entire organisms and their ecologies.The US Department of Energy (DOE) has invested substantially in efforts to understand the complex interplay between biological and abiotic processes that influence soil, water, and environmental dynamics of our biosphere.The community that has grown around these efforts recognizes the need for scientists of diverse backgrounds to have access to sophisticated computational tools that enable them to analyze complex and heterogeneous data sets and integrate their data and results effectively with the work of others.In this way, new data and conclusions can be rapidly propagated across existing, related analyses and easily discovered by the community for evaluation and comparison with previous results 1-3 .Here we present the DOE Systems Biology Knowledgebase (KBase, http://kbase.us),an open-source software and data platform that enables data sharing, integration, and analysis of microbes, plants, and their communities.KBase maintains an internal reference database that consolidates information from widely used external data repositories.This includes over 90,000 microbial genomes from RefSeq 4 , over 50 plant genomes from Phytozome 5 , over 300 Biolog media formulations 6 , and >30,000 reactions and compounds from KEGG 7 , BIGG 8 , and MetaCyc 9 .These public data are available for integration with user data where appropriate (e.g., genome comparison or building species trees).KBase links these diverse data types with a range of analytical functions within a web-based user interface.This extensive community resource facilitates large-scale analyses on scalable computing infrastructure and has

Health, Toxicology And Mutagenesis

Biology

0

Paper

Health, Toxicology And Mutagenesis

1,089

0

Save

0

Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud

Keith Jackson et al.Nov 1, 2010

Cloud computing has seen tremendous growth, particularly for commercial web applications. The on-demand, pay-as-you-go model creates a flexible and cost-effective means to access compute resources. For these reasons, the scientific computing community has shown increasing interest in exploring cloud computing. However, the underlying implementation and performance of clouds are very different from those at traditional supercomputing centers. It is therefore critical to evaluate the performance of HPC applications in today's cloud environments to understand the tradeoffs inherent in migrating to the cloud. This work represents the most comprehensive evaluation to date comparing conventional HPC platforms to Amazon EC2, using real applications representative of the workload at a typical supercomputing center. Overall results indicate that EC2 is six times slower than a typical mid-range Linux cluster, and twenty times slower than a modern HPC system. The interconnect on the EC2 cloud platform severely limits performance and causes significant variability.

Ecology

Information Systems

0

Paper

Save

The DOE Systems Biology Knowledgebase (KBase)

Adam Arkin et al.Dec 22, 2016

Abstract The U.S. Department of Energy Systems Biology Knowledgebase (KBase) is an open-source software and data platform designed to meet the grand challenge of systems biology — predicting and designing biological function from the biomolecular (small scale) to the ecological (large scale). KBase is available for anyone to use, and enables researchers to collaboratively generate, test, compare, and share hypotheses about biological functions; perform large-scale analyses on scalable computing infrastructure; and combine experimental evidence and conclusions that lead to accurate models of plant and microbial physiology and community dynamics. The KBase platform has (1) extensible analytical capabilities that currently include genome assembly, annotation, ontology assignment, comparative genomics, transcriptomics, and metabolic modeling; (2) a web-browser-based user interface that supports building, sharing, and publishing reproducible and well-annotated analyses with integrated data; (3) access to extensive computational resources; and (4) a software development kit allowing the community to add functionality to the system.

Philosophy

Artificial Intelligence

0

Paper

Save

The ModelSEED Database for the integration of metabolic annotations and the reconstruction, comparison, and analysis of metabolic models for plants, fungi, and microbes

Samuel Seaver et al.Apr 1, 2020

Introduction: For over ten years, the ModelSEED has been a primary resource for researchers endeavoring to construct draft genome-scale metabolic models based on annotated microbial or plant genomes. As described here, and now being released, the ModelSEED biochemistry database serves as the foundation of biochemical data underlying the ModelSEED and KBase. Objectives: The ModelSEED biochemistry database embodies several properties that, taken together, distinguish it from other published biochemistry resources by being: (i) a database to serve metabolic modeling by including compartmentalization, transport reactions, charged molecules, proton balancing on reactions, and templates for model species; (ii) extensible by the user community, with all data stored in GitHub; and (iii) designed as a biochemical "Rosetta Stone" to facilitate comparison and integration of annotations from many different tools and databases. Methods: The ModelSEED was constructed by combining chemistry from many resources, applying standard transformations to data, identifying overlapping compounds and reactions, and computing thermodynamic properties. The ModelSEED biochemistry is continually tested using flux balance analysis to ensure the biochemical network is modeling-ready and capable of simulating diverse phenotypes. We also develop ontologies designed to aid in comparing and reconciling metabolic reconstructions that differ in how they represent various metabolic pathways. Results: The current ModelSEED includes 33,978 compounds and 36,645 reactions, made available in an extensible set of files on GitHub, and visualized via the web from the ModelSEED and KBase. Conclusion: This database serves as a transparent source of biochemistry data to broadly support mechanistic modeling and data integration.

Biochemistry

Molecular Biology

0

Paper

Save

RWRtoolkit: multi-omic network analysis using random walks on multiplex networks in any species

David Kainer et al.Jul 19, 2024

Leveraging the use of multiplex multi-omic networks, key insights into genetic and epigenetic mechanisms supporting biofuel production have been uncovered. Here, we introduce RWRtoolkit, a multiplex generation, exploration, and statistical package built for R and command line users. RWRtoolkit enables the efficient exploration of large and highly complex biological networks generated from custom experimental data and/or from publicly available datasets, and is species agnostic. A range of functions can be used to find topological distances between biological entities, determine relationships within sets of interest, search for topological context around sets of interest, and statistically evaluate the strength of relationships within and between sets. The command-line interface is designed for parallelisation on high performance cluster systems, which enables high throughput analysis such as permutation testing. Several tools in the package have also been made available for use in reproducible workflows via the KBase web application.

Molecular Biology

Biology

0

Paper

Molecular Biology

Biology

0

Save