ResearchHub | Open Science Community

DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants

Janet Piñero et al.Oct 18, 2016

The information about the genetic basis of human diseases lies at the heart of precision medicine and drug discovery. However, to realize its full potential to support these goals, several problems, such as fragmentation, heterogeneity, availability and different conceptualization of the data must be overcome. To provide the community with a resource free of these hurdles, we have developed DisGeNET (http://www.disgenet.org), one of the largest available collections of genes and variants involved in human diseases. DisGeNET integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific literature. DisGeNET data are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype–phenotype relationships. The information is accessible through a web interface, a Cytoscape App, an RDF SPARQL endpoint, scripts in several programming languages and an R package. DisGeNET is a versatile platform that can be used for different research purposes including the investigation of the molecular underpinnings of specific human diseases and their comorbidities, the analysis of the properties of disease genes, the generation of hypothesis on drug therapeutic action and drug adverse effects, the validation of computationally predicted disease genes and the evaluation of text-mining methods performance.

Genetics

Molecular Biology

0

Paper

Save

The DisGeNET knowledge platform for disease genomics: 2019 update

Janet Piñero et al.Oct 18, 2019

Abstract One of the most pressing challenges in genomic medicine is to understand the role played by genetic variation in health and disease. Thanks to the exploration of genomic variants at large scale, hundreds of thousands of disease-associated loci have been uncovered. However, the identification of variants of clinical relevance is a significant challenge that requires comprehensive interrogation of previous knowledge and linkage to new experimental results. To assist in this complex task, we created DisGeNET (http://www.disgenet.org/), a knowledge management platform integrating and standardizing data about disease associated genes and variants from multiple sources, including the scientific literature. DisGeNET covers the full spectrum of human diseases as well as normal and abnormal traits. The current release covers more than 24 000 diseases and traits, 17 000 genes and 117 000 genomic variants. The latest developments of DisGeNET include new sources of data, novel data attributes and prioritization metrics, a redesigned web interface and recently launched APIs. Thanks to the data standardization, the combination of expert curated information with data automatically mined from the scientific literature, and a suite of tools for accessing its publicly available data, DisGeNET is an interoperable resource supporting a variety of applications in genomic medicine and drug R&D.

Genetics

History

0

Paper

Save

DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes

Janet Piñero et al.Apr 15, 2015

DisGeNET is a comprehensive discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET contains over 380 000 associations between >16 000 genes and 13 000 diseases, which makes it one of the largest repositories currently available of its kind. DisGeNET integrates expert-curated databases with text-mined data, covers information on Mendelian and complex diseases, and includes data from animal disease models. It features a score based on the supporting evidence to prioritize gene-disease associations. It is an open access resource available through a web interface, a Cytoscape plugin and as a Semantic Web resource. The web interface supports user-friendly data exploration and navigation. DisGeNET data can also be analysed via the DisGeNET Cytoscape plugin, and enriched with the annotations of other plugins of this popular network analysis software suite. Finally, the information contained in DisGeNET can be expanded and complemented using Semantic Web technologies and linked to a variety of resources already present in the Linked Data cloud. Hence, DisGeNET offers one of the most comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin, designed to fulfill the needs of different user profiles, including bioinformaticians, biologists and health-care practitioners. Database URL: http://www.disgenet.org/

History

Artificial Intelligence

0

Paper

Save

The DisGeNET cytoscape app: Exploring and visualizing disease genomics data

Janet Piñero et al.Jan 1, 2021

Thanks to the unbiased exploration of genomic variants at large scale, hundreds of thousands of disease-associated loci have been uncovered. In parallel, network-based approaches have proven to be essential to understand the molecular mechanisms underlying human diseases. The use of these approaches has been boosted by the abundance of information about disease associated genes and variants, high quality human interactomics data, and the emergence of new types of omics data. The DisGeNET Cytoscape App combines the capabilities of Cytoscape with those of DisGeNET, a knowledge platform based on a comprehensive catalogue of disease-associated genes and variants. The DisGeNET Cytoscape App contains functions to query, analyze, and visualize different network representations of the gene-disease and variant-disease associations available in DisGeNET. It supports a wide variety of applications through its query and filter functionalities, including the annotation of foreign networks generated by other apps or uploaded by the user. The new release of the DisGeNET Cytoscape App has been designed to support Cytoscape 3.x and incorporates novel distinctive features such as visualization and analysis of variant-disease networks, disease enrichment analysis for genes and variants, and analytic support through Cytoscape Automation. Moreover, the DisGeNET Cytoscape App features an API to access its core functionalities via the REST protocol fostering the development of reproducible and scalable analysis workflows based on DisGeNET data.

Genetics

Artificial Intelligence

0

Paper

Save

Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research

Àlex Bravo et al.Feb 20, 2015

Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.

Genetics

Artificial Intelligence

1

Paper

Save

Capturing variation impact on molecular interactions in the IMEx Consortium mutations data set

J. Khadake et al.Dec 27, 2018

Abstract The current wealth of genomic variation data identified at nucleotide level presents the challenge of understanding by which mechanisms amino acid variation affects cellular processes. These effects may manifest as distinct phenotypic differences between individuals or result in the development of disease. Physical interactions between molecules are the linking steps underlying most, if not all, cellular processes. Understanding the effects that sequence variation has on a molecule’s interactions is a key step towards connecting mechanistic characterization of nonsynonymous variation to phenotype. We present an open access resource created over 14 years by IMEx database curators, featuring 28,000 annotations describing the effect of small sequence changes on physical protein interactions. We describe how this resource was built, the formats in which the data is provided and offer a descriptive analysis of the data set. The data set is publicly available through the IntAct website and is enhanced with every monthly release.

Genetics

Artificial Intelligence

1

Paper

Save

The human hepatocyte TXG-MAPr: WGCNA transcriptomic modules to support mechanism-based risk assessment

Giulia Callegaro et al.May 18, 2021

Abstract Mechanism-based risk assessment is urged to advance and fully permeate into current safety assessment practices, possibly at early phases of drug safety testing. Toxicogenomics is a promising source of comprehensive and mechanisms-revealing data, but analysis tools to interpret mechanisms of toxicity and specific for the testing systems (e.g. hepatocytes) are lacking. In this study we present the TXG-MAPr webtool (available at https://txg-mapr.eu/WGCNA_PHH/TGGATEs_PHH/ ), an R-Shiny-based implementation of weighted gene co-expression networks (WGCNA) obtained from the Primary Human Hepatocytes (PHH) TG-GATEs dataset. Gene co-expression networks (modules) were annotated with functional information (pathway enrichment, transcription factor) to reveal their mechanistic interpretation. Several well-known stress response pathways were captured in the modules, are perturbed by specific stressors and show preserved in rat systems (rat primary hepatocytes and rat in vivo liver), highlighting stress responses that translate across species/testing systems. The TXG-MAPr tool was successfully applied to investigate the mechanism of toxicity of TG-GATEs compounds and using external datasets obtained from different hepatocyte cells and microarray platforms. Additionally, we suggest that module responses can be calculated from targeted RNA-seq data therefore imputing biological responses from a limited gene. By analyzing 50 different PHH donors’ responses to a common stressor, tunicamycin, we were able to suggest modules associated with donor’s traits, e.g. pre-existing disease state, therefore connected to donors’ variability. In conclusion, we demonstrated that gene co-expression analysis coupled to an interactive visualization environment, the TXG-MAPr, is a promising approach to achieve mechanistic relevant, cross-species and cross-platform evaluation of toxicogenomic data.

Genetics

Philosophy

4

Paper

Save

PREDICTING GENE DISEASE ASSOCIATIONS WITH KNOWLEDGE GRAPH EMBEDDINGS FOR DISEASES WITH CURTAILED INFORMATION

Francesco Gualdi et al.Jan 15, 2024

ABSTRACT Knowledge graph embeddings (KGE) are a powerful technique used in the biological domain to represent biological knowledge in a low dimensional space. However, a deep understanding of these methods is still missing, and in particular the limitations for diseases with reduced information on gene-disease associations. In this contribution, we built a knowledge graph (KG) by integrating heterogeneous biomedical data and generated KGEs by implementing state-of-the-art methods, and two novel algorithms: DLemb and BioKG2Vec. Extensive testing of the embeddings with unsupervised clustering and supervised methods showed that our novel approaches outperform existing algorithms in both scenarios. Our results indicate that data preprocessing and integration influence the quality of the predictions and that the embeddings efficiently encodes biological information when compared to a null model. Finally, we employed KGE to predict genes associated with Intervertebral disc degeneration (IDD) and showed that functions relevant to the disease are enriched in the genes prioritized from the model GRAPHICAL ABSTRACT

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

Molecular Biology

0

Save

23

A versatile and interoperable computational framework for the analysis and modeling of COVID-19 disease mechanisms

Anna Niarakis et al.Dec 19, 2022

Abstract The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing. Community-driven and highly interdisciplinary, the project is collaborative and supports community standards, open access, and the FAIR data principles. The coordination of community work allowed for an impressive step forward in building interfaces between Systems Biology tools and platforms. Our framework links key molecules highlighted from broad omics data analysis and computational modeling to dysregulated pathways in a cell-, tissue- or patient-specific manner. We also employ text mining and AI-assisted analysis to identify potential drugs and drug targets and use topological analysis to reveal interesting structural features of the map. The proposed framework is versatile and expandable, offering a significant upgrade in the arsenal used to understand virus-host interactions and other complex pathologies.

Ecology

Biophysics

23

Paper

Save

Targeting comorbid diseases via network endopharmacology

Joaquim Aguirre‐Plans et al.May 4, 2018

The traditional drug discovery paradigm has shaped around the idea of "one target, one disease". Recently, it has become clear that not only it is hard to achieve single target specificity but also it is often more desirable to tinker the complex cellular network by targeting multiple proteins, causing a paradigm shift towards polypharmacology (multiple targets, one disease). Given the lack of clear-cut boundaries across disease (endo)phenotypes and genetic heterogeneity across patients, a natural extension to the current polypharmacology paradigm is targeting common biological pathways involved in diseases, giving rise to "endopharmacology" (multiple targets, multiple diseases). In this study, leveraging powerful network medicine tools, we describe a recipe for first, identifying common pathways pertaining to diseases and then, prioritizing drugs that target these pathways towards endopharmacology. We present proximal pathway enrichment analysis (PxEA) that uses the topology information of the network of interactions between disease genes, pathway genes, drug targets and other proteins to rank drugs for their interactome-based proximity to pathways shared across multiple diseases, providing unprecedented drug repurposing opportunities. As a proof of principle, we focus on nine autoimmune disorders and using PxEA, we show that many drugs indicated for these conditions are not necessarily specific to the condition of interest, but rather target the common biological pathways across these diseases. Finally, we provide the high scoring drug repurposing candidates that can target common mechanisms involved in type 2 diabetes and Alzheimer's disease, two phenotypes that have recently gained attention due to the increased comorbidity among patients.

Genetics

Ecology

0

Paper

Genetics

Ecology

0

Save