ResearchHub | Open Science Community

Hybrid protein-ligand binding residue prediction with protein language models: Does the structure matter?

Hamza Gamouh et al.Aug 15, 2023

Abstract Background Predicting protein-ligand binding sites is crucial in studying protein interactions with applications in biotechnology and drug discovery. Two distinct paradigms have emerged for this purpose: sequence-based methods, which leverage protein sequence information, and structure-based methods, which rely on the three-dimensional (3D) structure of the protein. We propose to study a hybrid approach combining both paradigms’ strengths by integrating two recent deep learning architectures: protein language models (pLMs) from the sequence-based paradigm and Graph Neural Networks (GNNs) from the structure-based paradigm. Specifically, we construct a residue-level Graph Attention Network (GAT) model based on the protein’s 3D structure that uses pre-trained pLM embeddings as node features. This integration enables us to study the interplay between the sequential information encoded in the protein sequence and the spatial relationships within the protein structure on the model’s performance. Results By exploiting a benchmark dataset over a range of ligands and ligand types, we have shown that using the structure information consistently enhances the predictive power of baselines in absolute terms. Nevertheless, as more complex pLMs are employed to represent node features, the relative impact of the structure information represented by the GNN architecture diminishes. Conclusions The above observations suggest that, although using the experimental protein structure almost always improves the accuracy binding site prediction, complex pLMs still contain structural information that lead to good predictive performance even without using 3D structure.

Artificial Intelligence

Biochemistry

1

Paper

Artificial Intelligence

3

0

Save

0

PrankWeb: web server for ligand binding-site prediction and visualization

Lukáš Jendele et al.Mar 28, 2019

PrankWeb is an online resource providing an interface to P2Rank, a state-of-the-art ligand binding site prediction method. P2Rank is a template-free machine learning method which is based on the prediction of ligandability of local chemical neighborhoods centered on points placed on a solvent accessible surface of a protein. Points with high ligandability score are then clustered to form the resulting ligand binding sites. On top of that, PrankWeb then provides a web interface enabling users to easily carry out the prediction and visually inspect the predicted binding sites via an integrated sequence-structure view. Moreover, PrankWeb can determine sequence conservation for the input molecule and use it in both the prediction and results visualization steps. Alongside its online visualization options, PrankWeb also offers the possibility to export the results as a PyMOL script for offline visualization. The web frontend communicates with the serer side via a REST API. Therefore, in high-throughput scenarios users can utilize the server API directly, bypassing the need for a web-based front end or installation of the P2Rank application. PrankWeb is available at http://prankweb.cz/. The source code of the web application and the P2Rank method can be accessed at https://github.com/jendelel/PrankWebApp and https://github.com/rdk/p2rank, respectively.

Biochemistry

Pharmacology

0

Paper

Save

Coenzyme-Protein Interactions since Early Life

Alma Rocha et al.Jan 1, 2023

Recent findings in protein evolution and peptide prebiotic plausibility have been setting the stage for reconsidering the role of peptides in the early stages of life9s origin. Ancient protein families have been found to share common themes and proteins reduced in composition to prebiotically plausible amino acids have been reported capable of structure formation and key functions, such as binding to RNA. While this may suggest peptide relevance in early life, their functional repertoire when composed of a limited number of early residues (missing some of the most sophisticated functional groups of today9s alphabet) has been debated. Cofactors enrich the functional scope of about half of extant enzymes but whether they could also bind to peptides lacking the evolutionary late amino acids remains speculative. The aim of this study was to resolve the early peptide propensity to bind organic cofactors by analysis of protein-coenzyme interactions across the Protein Data Bank (PDB). We find that the prebiotically plausible amino acids are more abundant in the binding sites of the most ancient coenzymes and that such interactions rely more frequently on the involvement of the protein backbone atoms and metal ion cofactors. Moreover, we have identified a few select examples in today9s enzymes where coenzyme binding is supported solely by prebiotically available amino acids. These results imply the plausibility of a coenzyme-peptide functional collaboration preceding the establishment of the Central Dogma and full protein alphabet evolution.

Biochemistry

Molecular Biology

0

Paper

Save

AHoJ: rapid, tailored search and retrieval of apo and holo protein structures for user-defined ligands

Christos Feidakis et al.Sep 6, 2022

Abstract Understanding the mechanism of action of a protein or designing better ligands for it often requires access to a bound (holo) and an unbound (apo) state of the protein. Resources for the quick and easy retrieval of such conformations are severely limited. Apo-Holo Juxtaposition (AHoJ) is a web application for retrieving apo-holo structure pairs for user-defined ligands. Given a query structure and one or more defined ligands, it retrieves all other structures of the same protein that feature the same binding sites(s), aligns them, and examines the superimposed binding sites to determine whether each structure is apo or holo, in reference to the query. The resulting superimposed datasets of apo-holo pairs can be visualized and downloaded for further analysis. AHoJ accepts multiple input queries, allowing the creation of customized apo-holo datasets. To demonstrate AHoJ’s functionality, we present a newly constructed dataset of apo-holo pairs featuring 13 ion ligands, by complimenting an existing database of biologically relevant holo interactions (BioLiP). Availability and Implementation Freely available for non-commercial use at http://apoholo.cz . Graphical abstract

Philosophy

Biochemistry

1

Paper

Save

CryptoBench: Cryptic protein-ligand binding sites dataset and benchmark

Vít Škrhák et al.Aug 21, 2024

Abstract Structure-based methods for detecting protein-ligand binding sites play a crucial role in various domains, from fundamental research to biomedical applications. However, current prediction methodologies often rely on holo (ligand-bound) protein conformations for training and evaluation, overlooking the significance of the apo (ligand-free) states. This oversight is particularly problematic in the case of cryptic binding sites (CBSs) where holo-based assessment yields unrealistic performance expectations. To advance the development in this domain, we introduce CryptoBench, a benchmark dataset tailored for training and evaluating novel CBS prediction methodologies. CryptoBench is constructed upon a large collection of apo-holo protein pairs, grouped by UniProtID, clustered by sequence identity, and filtered to contain only structures with substantial structural change in the binding site. CryptoBench comprises 1,107 structures with predefined cross-validation splits, making it the most extensive CBS dataset to date. To establish a performance baseline, we measured the predictive power of sequence- and structure-based CBS residue prediction methods using the benchmark. We selected PocketMiner as the state-of-the-art representative of the structure-based methods for CBS detection, and P2Rank, a widely-used structure-based method for general binding site prediction that is not specifically tailored for cryptic sites. For sequence-based approaches, we trained a neural network to classify binding residues using protein language model embeddings. Our sequence-based approach outperformed PocketMiner and P2Rank across key metrics, including AUC, AUPRC, MCC, and F1 scores. These results provide baseline benchmark results for future CBS and potentially also non-CBS prediction endeavors, leveraging CryptoBench as the foundational platform for further advancements in the field.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save