ResearchHub | Open Science Community

High-performance web services for querying gene and variant annotation

Jiwen Xin et al.May 6, 2016

Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.

Genetics

Molecular Biology

0

Paper

Save

Wikidata as a FAIR knowledge graph for the life sciences

Andra Waagmeester et al.Oct 21, 2019

Wikidata is a community-maintained knowledge base that epitomizes the FAIR principles of Findability, Accessibility, Interoperability, and Reusability. Here, we describe the breadth and depth of biomedical knowledge contained within Wikidata, assembled from primary knowledge repositories on genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases. We built a collection of open-source tools that simplify the addition and synchronization of Wikidata with source databases. We furthermore demonstrate several use cases of how the continuously updated, crowd-contributed knowledge in Wikidata can be mined. These use cases cover a diverse cross section of biomedical analyses, from crowdsourced curation of biomedical ontologies, to phenotype-based diagnosis of disease, to drug repurposing.

Molecular Biology

Information Systems

0

Paper

Save

A comprehensive and scalable database search system for metaproteomics

Sandip Chatterjee et al.May 18, 2016

Background Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Results Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy, and allowing for a more in-depth characterization of the functional landscape of the samples. Conclusions The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomics search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.

Biochemistry

Molecular Biology

0

Paper

Save

Metaproteomics of colonic microbiota unveils discrete protein functions among colitic mice and control groups

Clara Moon et al.Nov 15, 2017

Metaproteomics can greatly assist established high-throughput sequencing methodologies to provide systems biological insights into the alterations of microbial protein functionalities correlated with disease-associated dysbiosis of the intestinal microbiota. Here, we utilized the well-characterized murine T cell transfer model of colitis to find specific changes within the intestinal luminal proteome associated with inflammation. MS proteomic analysis of colonic samples permitted the identification of ~10,000-12,000 unique peptides that corresponded to 5,610 protein clusters identified across three groups, including the colitic Rag1-/- T cell recipients, isogenic Rag1-/- controls, and wild-type mice. We demonstrate that the colitic mice exhibited a significant increase in Proteobacteria and Verrucomicrobia and show that such alterations in the microbial communities contributed to the enrichment of specific proteins with transcription and translation gene ontology terms. In combination with 16S sequencing, our metaproteomics-based microbiome studies provide a foundation for assessing alterations in intestinal luminal protein functionalities in a robust and well-characterized mouse model of colitis, and set the stage for future studies to further explore the functional mechanisms of altered protein functionalities associated with dysbiosis and inflammation.

Genetics

Immunology

0

Paper

Save

MyGene.info and MyVariant.info: Gene and Variant Annotation Query Services

Jiwen Xin et al.Dec 30, 2015

MyGene.info and MyVariant.info provide high-performance data APIs for querying gene and variant annotation information. They demonstrate a new model for organizing biological annotation information by utilizing a cloud-based scalable infrastructure. MyGene.info and MyVariant.info can be accessed at http://mygene.info and http://myvariant.info.

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

Molecular Biology

0

Save

0

Structured Reviews for Data and Knowledge Driven Research

Núria Queralt-Rosiñach et al.Aug 12, 2019

Motivation: Hypothesis generation is a critical step in research and a cornerstone in the rare disease field. Research is most efficient when those hypotheses are based on the entirety of knowledge known to date. Systematic review articles are commonly used in biomedicine to summarize existing knowledge and contextualize experimental data. But the information contained within review articles is typically only expressed as free-text, which is difficult to use computationally. Researchers struggle to navigate, collect and remix prior knowledge as it is scattered in several silos without seamless integration and access. This lack of a structured information framework hinders research by both experimental and computational scientists. Results: To better organize knowledge and data, we built a structured review article that is specifically focused on NGLY1 Deficiency, an ultra-rare genetic disease first reported in 2012. We represented this structured review as a knowledge graph, and then stored this knowledge graph in a Neo4j database to simplify dissemination, querying, and visualization of the network. Relative to free-text, this structured review better promotes the principles of findability, accessibility, interoperability, and reusability (FAIR). In collaboration with domain experts in NGLY1 Deficiency, we demonstrate how this resource can improve the efficiency and comprehensiveness of hypothesis generation. We also developed a read-write interface that allows domain experts to contribute FAIR structured knowledge to this community resource. In contrast to traditional free-text review articles, this structured review exists as a living knowledge graph that is curated by humans and accessible to computational analyses. Finally, we have generalized this workflow into modular and repurposable components that can be applied to other domain areas. This NGLY1 Deficiency-focused network is publicly available at http://ngly1graph.org/. Availability and implementation: Source code and network data files are at: https://github.com/SuLab/ngly1-graph and https://github.com/SuLab/bioknowledge-reviewer.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save

5

Quantitative metaproteomics of patient fecal microbiota identifies host and microbial proteins associated with ulcerative colitis

Peter Thuy-Boun et al.Nov 11, 2020

Abstract Mass spectrometry-based metaproteomics technologies enable the direct observation of proteins within complex multi-organism environments. A major hurdle in mapping metaproteomic fragmentation spectra to their corresponding peptides is the need for large peptide databases encompassing all anticipated species contained within a biological sample. As we cannot predict the taxonomic composition of microbiomes a priori , we developed the ComPIL database which contains a comprehensive collection of 4.8 billion unique peptides from public sequencing repositories to enable our proteomics analyses. We analyzed fecal samples from ulcerative colitis (UC) patients using a tandem mass spectrometry (LC-MS/MS) workflow coupled to ComPIL in search of aberrant UC-associated proteins. We found 176 host and microbial protein groups differentially enriched between the healthy (control) or UC volunteer groups. Notably, gene ontology (GO) enrichment analysis revealed that serine-type endopeptidases are overrepresented in UC compared to healthy volunteers. Additionally, we demonstrate the feasibility of serine hydrolase chemical enrichment from fecal samples using a biotinylated fluorophosphate (FP) probe. Our findings illustrate that probe-susceptible hydrolases from hosts and microbes are likely active in the distal gut. Finally, we applied de novo peptide sequencing methods to our metaproteomics data to estimate the size of the “dark peptidome,” the complement of peptides unidentified using ComPIL. We posit that our metaproteomics methods are generally applicable to future microbiota analyses and that our list of FP probe-enriched hydrolases may represent an important functionality to understanding the etiology of UC.

Biochemistry

Oncology

5

Paper

Save

WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata.

Tim Putman et al.Jan 21, 2017

With the advancement of genome sequencing technologies new genomes are being sequenced daily. While these sequences are deposited in publicly available data warehouses, their functional and genomic annotations mostly reside in the text of primary publications. Biocurators are hard at work extracting those annotations from the literature for the most studied organisms and depositing them in structured databases. However, the resources don't exist to fund the comprehensive curation of the thousands of newly sequenced organisms in this manner. Here we describe WikiGenomes (wikigenomes.org), a web application that facilitates the consumption and curation of genomic data by the entire scientific community. WikiGenomes is based on Wikidata, an openly editable knowledge graph with the goal of aggregating published knowledge into a free and open database. WikiGenomes empowers community curation of genomic and biomedical knowledge through a domain-specific application built on top of Wikidata, bringing that curated knowledge to the public domain.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save

0

Triflic acid treatment enables LC-MS/MS analysis of insoluble bacterial biomass

Ana Wang et al.Jul 11, 2018

The lysis and extraction of soluble bacterial proteins from cells is a common practice for proteomics analyses, but insoluble bacterial biomasses are often left behind. Here, we show that with triflic acid treatment, the insoluble bacterial biomass of Gram- and Gram+ bacteria can be rendered soluble. We use LC-MS/MS shotgun proteomics to show that bacterial proteins in the soluble and insoluble post-lysis fractions differ significantly. Additionally, in the case of Gram- Pseudomonas aeruginosa, triflic acid treatment enables the enrichment of cell envelope-associated proteins. Finally, we apply triflic acid to a human microbiome sample to show that this treatment is robust and enables the identification of a new, complementary subset of proteins from a complex microbial mixture.

Genetics

Biochemistry

0

Paper

Genetics

Biochemistry

0

Save

High-performance web services for querying gene and variant annotation

Wikidata as a FAIR knowledge graph for the life sciences

A comprehensive and scalable database search system for metaproteomics

Metaproteomics of colonic microbiota unveils discrete protein functions among colitic mice and control groups

MyGene.info and MyVariant.info: Gene and Variant Annotation Query Services

Structured Reviews for Data and Knowledge Driven Research

Quantitative metaproteomics of patient fecal microbiota identifies host and microbial proteins associated with ulcerative colitis

WikiGenomes: an open Web application for community consumption and curation of gene annotation data in Wikidata.

Triflic acid treatment enables LC-MS/MS analysis of insoluble bacterial biomass

Scan to connect with one of our mobile apps

Coinbase Wallet app

Coinbase app

Or try the Coinbase Wallet browser extension