ResearchHub | Open Science Community

Critical assessment of protein intrinsic disorder prediction

Marco Necci et al.Apr 19, 2021

Intrinsically disordered proteins, defying the traditional protein structure-function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude.

Genetics

Artificial Intelligence

1

Paper

Save

rawMSA: End-to-end Deep Learning Makes Protein Sequence Profiles and Feature Extraction obsolete

Claudio Mirabello et al.Aug 17, 2018

Abstract In the last few decades, huge efforts have been made in the bioinformatics community to develop machine learning-based methods for the prediction of structural features of proteins in the hope of answering fundamental questions about the way proteins function and about their involvement in several illnesses. The recent advent of Deep Learning has renewed the interest in neural networks, with dozens of methods being developed in the hope of taking advantage of these new architectures. On the other hand, most methods are still based on heavy pre-processing of the input data, as well as the extraction and integration of multiple hand-picked, manually designed features. Since Multiple Sequence Alignments (MSA) are almost always the main source of information in de novo prediction methods, it should be possible to develop Deep Networks to automatically refine the data and extract useful features from it. In this work, we propose a new paradigm for the prediction of protein structural features called rawMSA. The core idea behind rawMSA is borrowed from the field of natural language processing to map amino acid sequences into an adaptively learned continuous space. This allows the whole MSA to be input into a Deep Network, thus rendering sequence profiles and other pre-calculated features obsolete. We developed rawMSA in three different flavors to predict secondary structure, relative solvent accessibility and inter-residue contact maps. We have rigorously trained and benchmarked rawMSA on a large set of proteins and have determined that it outperforms classical methods based on position-specific scoring matrices (PSSM) when predicting secondary structure and solvent accessibility, while performing on a par with the top ranked CASP12 methods in the inter-residue contact map prediction category. We believe that rawMSA represents a promising, more powerful approach to protein structure prediction that could replace older methods based on protein profiles in the coming years. Availability datasets, dataset generation code, evaluation code and models are available at: https://bitbucket.org/clami66/rawmsa

Genetics

Artificial Intelligence

0

Paper

Save

aMeta: an accurate and memory-efficient ancient Metagenomic profiling workflow

Zoé Pochon et al.Oct 5, 2022

Abstract Analysis of microbial data from archaeological samples is a rapidly growing field with a great potential for understanding ancient environments, lifestyles and disease spread in the past. However, high error rates have been a long-standing challenge in ancient metagenomics analysis. This is also complicated by a limited choice of ancient microbiome specific computational frameworks that meet the growing computational demands of the field. Here, we propose aMeta, an accurate ancient Metagenomic profiling workflow designed primarily to minimize the amount of false discoveries and computer memory requirements. Using simulated ancient metagenomic samples, we benchmark aMeta against a current state-of-the-art workflow, and demonstrate its superior sensitivity and specificity in both microbial detection and authentication, as well as substantially lower usage of computer memory. aMeta is implemented as a Snakemake workflow to facilitate use and reproducibility.

Genetics

Biochemistry

58

Paper

Save

Predicting protein-peptide interaction sites using distant protein complexes as structural templates

Isak Johansson-Åkhe et al.Aug 23, 2018

ABSTRACT Protein-peptide interactions play an important role in major cellular processes, and are associated with several human diseases. To understand and potentially regulate these cellular function and diseases it is important to know the molecular details of the interactions. However, because of peptide flexibility and the transient nature of protein-peptide interactions, peptides are difficult to study experimentally. Thus, computational methods for predicting structural information about protein-peptide interactions are needed. Here we present InterPep, a pipeline for predicting protein-peptide interaction sites. It is a novel pipeline that, given a protein structure and a peptide sequence, utilizes structural template matches, sequence information, random forest machine learning, and hierarchical clustering to predict what region of the protein structure the peptide is most likely to bind. When tested on its ability to predict binding sites, InterPep successfully pinpointed 255 of 502 (50.7%) binding sites in experimentally determined structures at rank 1 and 348 of 502 (69.3%) among the top five predictions using only structures with no significant sequence similarity as templates. InterPep is a powerful tool for identifying peptide-binding sites; with a precision of 80% at a recall of 20% it should be an excellent starting point for docking protocols or experiments investigating peptide interactions. The source code for InterPred is available at http://wallnerlab.org/InterPep/

Biochemistry

Molecular Biology

0

Paper

Save

InterPepRank: Assessment of Docked Peptide Conformations by a Deep Graph Network

Isak Johansson-Åkhe et al.Sep 8, 2020

Abstract Motivation Peptide-protein interactions between a smaller or disordered peptide stretch and a folded receptor make up a large part of all protein-protein interactions. A common approach for modelling such interactions is to exhaustively sample the conformational space by fast-fourier-transform docking, and then refine a top percentage of decoys. Commonly, methods capable of ranking the decoys for selection in short enough time for larger scale studies rely on first-principle energy terms such as electrostatics, Van der Waals forces, or on pre-calculated statistical pairwise potentials. Results We present InterPepRank for peptide-protein complex scoring and ranking. InterPepRank is a machine-learning based method which encodes the structure of the complex as a graph; with physical pairwise interactions as edges and evolutionary and sequence features as nodes. The graph-network is trained to predict the LRMSD of decoys by using edge-conditioned graph convolutions on a large set of peptide-protein complex decoys. InterPepRank is tested on a massive independent test set with no targets sharing CATH annotation nor 30% sequence identity with any target in training or validation data. On this set, InterPepRank has a median AUC of 0.86 for finding coarse peptide-protein complexes with LRMSD < 4Å. This is an improvement compared to other state-of-the-art ranking methods that have a median AUC of circa 0.69. When included as selection-method for selecting decoys for refinement in a previously established peptide docking pipeline, InterPepRank improves the number of Medium and High quality models produced by 80% and 40%, respectively. Availability The program is available from: http://wallnerlab.org/InterPepRank Contact Björn Wallner bjorn.wallner@liu.se Supplementary information Supplementary data are available at BioRxiv online.

Artificial Intelligence

Biochemistry

1

Paper

Artificial Intelligence

3

0

Save

0

DockQ v2: Improved automatic quality measure for protein multimers, nucleic acids, and small molecules

Claudio Mirabello et al.Jun 2, 2024

Abstract It is important to assess the quality of modeled biomolecules to benchmark and assess the performance of different prediction methods. DockQ has emerged as the standard tool for assessing the quality of protein interfaces in model structures against given references. However, as predictions of large multimers with multiple chains become more common, there is a need to update DockQ with more functionality for robustness and speed. Moreover, as the field progresses and more methods are released to predict interactions between proteins and other types of molecules, such as nucleic acids and small molecules, it becomes necessary to have a tool that can assess all types of interactions. Here, we present a complete reimplementation of DockQ in pure Python. The updated version of DockQ is more portable, faster and introduces novel functionalities, such as automatic DockQ calculations for multiple interfaces and automatic chain mapping with multi-threading. These enhancements are designed to facilitate comparative analyses of protein complexes, particularly large multi-chain complexes. Furthermore, DockQ is now also able to score interfaces between proteins, nucleic acids, and small molecules. Code https://wallnerlab.org/DockQ

Biochemistry

Molecular Biology

0

Paper

Save

dgram2dmap: Extraction, visualisation and formatting of distance constraints from AlphaFold distograms

Björn Wallner et al.Dec 12, 2022

Abstract Distograms are data structures output by AlphaFold, along with the predicted 3D coordinates of the target protein, that encode predictions about Euclidean distances between pairs of amino acids. Although distograms are often overlooked, they do provide information that is to some extent complementary to that of the final 3D model. Here, we introduce dgram2dmap, a simple tool to convert distograms into distance maps that can be visualised and used in external tools for further downstream analyses. dgram2dmap runs within seconds to minutes, is open source and available on GitHub: https://github.com/clami66/dgram2dmap .

Philosophy

Artificial Intelligence

10

Paper

Philosophy

Artificial Intelligence

0

Save

0

Methods For Estimation Of Model Accuracy In CASP12

Arne Elofsson et al.May 30, 2017

Methods for reliably estimating the quality of 3D models of proteins are essential drivers for the wide adoption and serious acceptance of protein structure predictions by life scientists. In this paper, the most successful groups in CASP12 describe their latest methods for Estimates of Model Accuracy (EMA). We show that pure single model accuracy estimation methods has shown clear progress since CASP11; the three top methods (MESHI, ProQ3, SVMQA) all perform better than the top method of CASP11 (ProQ2). The pure single model accuracy estimation methods outperform quasi-single (ModFOLD6 variations) and consensus methods (Pcons, ModFOLDclust2, Pcomb-domain and Wallner) in model selection, but are still not as good as those methods in absolute model quality estimation and predictions of local quality. Finally, we show that when using contact based model quality measures (CAD, lDDT) the single model quality methods perform relatively better.

Philosophy

Molecular Biology

0

Paper

Save

Topology independent structural matching discovers novel templates for protein interfaces

Claudio Mirabello et al.Dec 18, 2017

Motivation: Protein-protein interactions (PPI) are essential for the function of the cellular machinery. The rapid growth of protein-protein complexes with known 3D structures offers a unique opportunity to study PPI to gain crucial insights into protein function and the causes of many diseases. In particular, it would be extremely useful to compare interaction surfaces of monomers, as this would enable the pinpointing of potential interaction surfaces based solely on the monomer structure, without the need to predict the complete complex structure. While there are many structural alignment algorithms for individual proteins, very few have been developed for protein interfaces, and none that can align only the interface residues to other interfaces or surfaces of interacting monomer subunits in a topology independent (non-sequential) manner. Results: We present InterComp, a method for topology and sequence-order independent structural comparisons. The method is general and can be applied to various structural comparison applications. By representing residues as independent points in space rather than as a sequence of residues, InterComp can can be applied to a wide range of problems including: interface-surface comparisons, interface-interface comparisons and even comparisons of small molecule ligands. We demonstrate a use-case by applying InterComp to find similar protein interfaces on the surface of proteins. We show that InterComp pinpoints the correct interface for almost half of the targets (283 of 586) when considering the top 10 hits, and for 24% of the top 1, even when no templates can be found with the already available sequence-order dependent methods like TM-align.

Biochemistry

Molecular Biology

0

Paper

Save

InterLig: a fast and accurate software for ligand-based virtual screening

Claudio Mirabello et al.Feb 10, 2019

In the past few years, drug discovery processes have been relying more and more on computational methods to sift out the most promising molecules before time and resources are spent to test them in experimental settings. Whenever the protein target of a given disease is not known, it becomes fundamental to have accurate methods for ligand-based Virtual Screening, which compare known active molecules against vast libraries of candidate compounds. Recently, 3D-based similarity methods have been developed that are capable of scaffold-hopping and to superimpose matching molecules. Here, we present InterLig, a new method for the comparison and superposition of small molecules based on 3D, topologically-independent alignments of atoms. We test InterLig on a standard benchmark and show that it compares favorably to the best currently available 3D methods. InterLig is open source and is available to everyone at: http://wallnerlab.org/interlig.

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

Molecular Biology

0

Save