ResearchHub | Open Science Community

Metrics for GO based protein semantic similarity: a systematic evaluation

Cátia Pesquita et al.Apr 1, 2008

Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations. We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation. This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid simGIC was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.

Philosophy

Artificial Intelligence

0

Paper

Save

Supervised Semantic Similarity

Rita Sousa et al.Feb 16, 2021

Abstract Background Semantic similarity between concepts in knowledge graphs is essential for several bioinformatics applications, including the prediction of protein-protein interactions and the discovery of associations between diseases and genes. Although knowledge graphs describe entities in terms of several perspectives (or semantic aspects), state-of-the-art semantic similarity measures are general-purpose. This can represent a challenge since different use cases for the application of semantic similarity may need different similarity perspectives and ultimately depend on expert knowledge for manual fine-tuning. Results We present a new approach that uses supervised machine learning to tailor aspect-oriented semantic similarity measures to fit a particular view on biological similarity or relatedness. We implement and evaluate it using different combinations of representative semantic similarity measures and machine learning methods with four biological similarity views: protein-protein interaction, protein function similarity, protein sequence similarity and phenotype-based gene similarity. Conclusions The results demonstrate that our approach outperforms non-supervised methods, producing semantic similarity models that fit different biological perspectives significantly better than the commonly used manual combinations of semantic aspects.

Artificial Intelligence

Molecular Biology

5

Paper

Artificial Intelligence

4

0

Save

0

The Immunopeptidomics Ontology (ImPO)

Daniel Faria et al.Jan 1, 2024

Abstract The adaptive immune response plays a vital role in eliminating infected and aberrant cells from the body. This process hinges on the presentation of short peptides by major histocompatibility complex Class I molecules on the cell surface. Immunopeptidomics, the study of peptides displayed on cells, delves into the wide variety of these peptides. Understanding the mechanisms behind antigen processing and presentation is crucial for effectively evaluating cancer immunotherapies. As an emerging domain, immunopeptidomics currently lacks standardization—there is neither an established terminology nor formally defined semantics—a critical concern considering the complexity, heterogeneity, and growing volume of data involved in immunopeptidomics studies. Additionally, there is a disconnection between how the proteomics community delivers the information about antigen presentation and its uptake by the clinical genomics community. Considering the significant relevance of immunopeptidomics in cancer, this shortcoming must be addressed to bridge the gap between research and clinical practice. In this work, we detail the development of the ImmunoPeptidomics Ontology, ImPO, the first effort at standardizing the terminology and semantics in the domain. ImPO aims to encapsulate and systematize data generated by immunopeptidomics experimental processes and bioinformatics analysis. ImPO establishes cross-references to 24 relevant ontologies, including the National Cancer Institute Thesaurus, Mondo Disease Ontology, Logical Observation Identifier Names and Codes and Experimental Factor Ontology. Although ImPO was developed using expert knowledge to characterize a large and representative data collection, it may be readily used to encode other datasets within the domain. Ultimately, ImPO facilitates data integration and analysis, enabling querying, inference and knowledge generation and importantly bridging the gap between the clinical proteomics and genomics communities. As the field of immunogenomics uses protein-level immunopeptidomics data, we expect ImPO to play a key role in supporting a rich and standardized description of the large-scale data that emerging high-throughput technologies are expected to bring in the near future. Ontology URL: https://zenodo.org/record/10237571 Project GitHub: https://github.com/liseda-lab/ImPO/blob/main/ImPO.owl

Philosophy

Molecular Biology

0

Paper

Save

Microenvironment of metastatic site reveals key predictors of PD-1 blockade response in renal cell carcinoma

Florian Jeanneret et al.Jul 19, 2023

Immune checkpoint blockade (ICB) therapies have improved the overall survival (OS) of many patients with advanced cancers. However, the response rate to ICB varies widely among patients, exposing non-responders to potentially severe immune-related adverse events. The discovery of new biomarkers to identify patients responding to ICB is now a critical need in the clinic. We therefore investigated the tumor microenvironment (TME) of advanced clear cell renal cell carcinoma (ccRCC) samples from primary and metastatic sites to identify molecular and cellular markers of response to ICB. We revealed a significant discrepancy in treatment response between subgroups based on cell fractions inferred from metastatic sites. One of the subgroups was enriched in non-responders and harbored a lower fraction of CD8+ T cells and plasma cells, as well as a decreased expression of immunoglobulin genes. In addition, we developed the Tumor-Immunity Differential (TID) score which combines features from tumor cells and the TME to accurately predict response to anti-PD-1 immunotherapy (AUC-ROC=0.88, log-rank tests for PFS P < 0.0001, OS P = 0.01). Finally, we also defined TID-related genes (YWHAE, CXCR6 and BTF3), among which YWHAE was validated as a robust predictive marker of ICB response in independent cohorts of pre- or on-treatment biopsies of melanoma and lung cancers. Overall, these results provide a rationale to further explore variations in the cell composition of metastatic sites, and underlying gene signatures, to predict patient response to ICB treatments.

Oncology

Immunology

1

Paper

Save

A comprehensive library of canonical and non-canonical MHC class I antigens for cancer vaccine development.

Georges Bedran et al.Jan 17, 2022

A longstanding disconnect between the growing number of MHC Class I immunopeptidomic studies and genomic medicine hinders cancer vaccine design. We develop COD-dipp to genomically map the full spectrum of detected canonical and non canonical (non-exonic) MHC Class I antigens from 26 cancer studies. We demonstrate that patient mutations in regions overlapping physically identified antigens better predict immunotherapy response when compared to neoantigen predictions. We suggest a vaccine design approach using 140,966 highly immune-visible regions of the genome annotated by their expression and haplotype frequency in the human population. These regions tend to be highly conserved, mutated in cancer and harbor 7.8 times more immunogenicity. Intersecting pan-cancer mutations with these immune surveilled regions revealed a potential to create off-the-shelf multi-epitope vaccines against public neoantigens. Here we release COD-dipp, a cancer vaccine toolkit as a web-application (www.proteogenomics.ca/codipp) and open-source high-throughput resource (upon peer-review).

Genetics

Immunology

1

Paper

Genetics

Immunology

0

Save