ResearchHub | Open Science Community

Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks

Julia Leman et al.Apr 5, 2021

Abstract Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours.

Software

Information Systems And Management

4

Paper

Save

Reliable protein-protein docking with AlphaFold, Rosetta, and replica-exchange

Ameya Harmalkar et al.Jul 29, 2023

Despite the recent breakthrough of AlphaFold (AF) in the field of protein sequence-to-structure prediction, modeling protein interfaces and predicting protein complex structures remains challenging, especially when there is a significant conformational change in one or both binding partners. Prior studies have demonstrated that AF-multimer (AFm) can predict accurate protein complexes in only up to 43% of cases.1 In this work, we combine AlphaFold as a structural template generator with a physics-based replica exchange docking algorithm. Using a curated collection of 254 available protein targets with both unbound and bound structures, we first demonstrate that AlphaFold confidence measures (pLDDT) can be repurposed for estimating protein flexibility and docking accuracy for multimers. We incorporate these metrics within our ReplicaDock 2.0 protocol2 to complete a robust in-silico pipeline for accurate protein complex structure prediction. AlphaRED (AlphaFold-initiated Replica Exchange Docking) successfully docks failed AF predictions including 97 failure cases in Docking Benchmark Set 5.5. AlphaRED generates CAPRI acceptable-quality or better predictions for 66% of benchmark targets. Further, on a subset of antigen-antibody targets, which is challenging for AFm (19% success rate), AlphaRED demonstrates a success rate of 51%. This new strategy demonstrates the success possible by integrating deep-learning based architectures trained on evolutionary information with physics-based enhanced sampling. The pipeline is available at github.com/Graylab/AlphaRED.

Artificial Intelligence

Biochemistry

1

Paper

Artificial Intelligence

3

0

Save

58

Flexible Protein-Protein Docking with a Multi-Track Iterative Transformer

Lee‐Shin Chu et al.Jul 1, 2023

Abstract Conventional protein-protein docking algorithms usually rely on heavy candidate sampling and re-ranking, but these steps are time-consuming and hinder applications that require high-throughput complex structure prediction, e.g., structure-based virtual screening. Existing deep learning methods for protein-protein docking, despite being much faster, suffer from low docking success rates. In addition, they simplify the problem to assume no conformational changes within any protein upon binding (rigid docking). This assumption precludes applications when binding-induced conformational changes play a role, such as allosteric inhibition or docking from uncertain unbound model structures. To address these limitations, we present GeoDock, a multi-track iterative transformer network to predict a docked structure from separate docking partners. Unlike deep learning models for protein structure prediction that input multiple sequence alignments (MSAs), GeoDock inputs just the sequences and structures of the docking partners, which suits the tasks when the individual structures are given. GeoDock is flexible at the protein residue level, allowing the prediction of conformational changes upon binding. For a benchmark set of rigid targets, GeoDock obtains a 41% success rate, outperforming all the other tested methods. For a more challenging benchmark set of flexible targets, GeoDock achieves a similar number of top-model successes as the traditional method ClusPro [1], but fewer than ReplicaDock2 [2]. GeoDock attains an average inference speed of under one second on a single GPU, enabling its application in large-scale structure screening. Although binding-induced conformational changes are still a challenge owing to limited training and evaluation data, our architecture sets up the foundation to capture this backbone flexibility. Code and a demonstration Jupyter notebook are available at https://github.com/Graylab/GeoDock .

Artificial Intelligence

Biochemistry

58

Paper

Artificial Intelligence

2

0

Save

3

Towards generalizable prediction of antibody thermostability using machine learning on sequence and structure features

Ameya Harmalkar et al.Jun 4, 2022

Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA's recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (bsAbs) with their single-chain variable fragment (scFv) modules have garnered particular interest owing to the advantage of engaging distinct targets. Despite their exquisite specificity and affinity, the relatively poor thermostability of these scFv modules often hampers their development as a potential therapeutic drug. In recent years, engineering antibody sequences to enhance their stability by mutations has gained considerable momentum. As experimental methods for antibody engineering are time-intensive, laborious, and expensive, computational methods serve as a fast and inexpensive alternative to conventional routes. In this work, we show two machine learning methods - one with pre-trained language models (PTLM) capturing functional effects of sequence variation, and second, a supervised convolutional neural network (CNN) trained with Rosetta energetic features - to better classify thermostable scFv variants from sequence. Both these models are trained over temperature-specific data (TS50 measurements) derived from multiple libraries of scFv sequences. In this work, we show that a sufficiently simple CNN model trained with energetic features generalizes better than a pre-trained language model on out-of-distribution (blind) sequences (average Spearman correlation coefficient of 0.4 as opposed to 0.15). Further, we demonstrate that for an independent mAb with available thermal melting temperatures for 20 experimentally characterized thermostable mutations, these models trained on TS50 data could identify 18 residue positions and 5 identical amino-acid mutations showing remarkable generalizability. Our results suggest that such models can be broadly applicable for improving the biological characteristics of antibodies. Further, transferring such models for alternative physico-chemical properties of scFvs can have potential applications in optimizing large-scale production and delivery of mAbs or bsAbs.

Artificial Intelligence

Biochemistry

3

Paper

Artificial Intelligence

Biochemistry

0

Save

1

Colicin-mediated transport of DNA through the iron transporter FepA

Ruth Cohen-Khait et al.May 11, 2021

ABSTRACT Colicins are protein antibiotics used by bacteria to eliminate competing Escherichia coli. Colicins frequently exploit outer membrane (OM) nutrient transporters to penetrate through the strictly impermeable bacterial cellular envelope. Here, applying live-cell fluorescence imaging we were able to follow colicin B (ColB) into E. coli and localize it within the periplasm. We further demonstrate that single-stranded DNA coupled to ColB is also transported into the periplasm, emphasizing that the import routes of colicins can be exploited to carry large cargo molecules into bacteria. Moreover, we characterize the molecular mechanism of ColB association with its OM receptor FepA, applying a combination of photo-activated crosslinking, mass spectrometry, and structural modeling. We demonstrate that complex formation is coincident with a large-scale conformational change in the colicin. Finally In vivo crosslinking experiments and supplementary simulations of the translocation process indicate that part of the colicin engages active transport by disguising itself to part of the cellular receptor.

Genetics

Ecology

1

Paper

Save

Induced fit with replica exchange improves protein complex structure prediction

Ameya Harmalkar et al.Dec 10, 2021

Despite the progress in prediction of protein complexes over the last decade, recent blind protein complex structure prediction challenges revealed limited success rates (less than 20% models with DockQ score > 0.4) on targets that exhibit significant conformational change upon binding. To overcome limitations in capturing backbone motions, we developed a new, aggressive sampling method that incorporates temperature replica exchange Monte Carlo (T-REMC) and conformational sampling techniques within docking protocols in Rosetta. Our method, ReplicaDock 2.0, mimics induced-fit mechanism of protein binding to sample backbone motions across putative interface residues on-the-fly, thereby recapitulating binding-partner induced conformational changes. Furthermore, ReplicaDock 2.0 clocks in at 150-500 CPU hours per target (protein-size dependent); a runtime that is significantly faster than Molecular Dynamics based approaches. For a benchmark set of 88 proteins with moderate to high flexibility (unbound-to-bound iRMSD over 1.2 Å), ReplicaDock 2.0 successfully docks 61% of moderately flexible complexes and 35% of highly flexible complexes. Additionally, we demonstrate that by biasing backbone sampling particularly towards residues comprising flexible loops or hinge domains, highly flexible targets can be predicted to under 2 Å accuracy. This indicates that additional gains are possible when mobile protein segments are known. Significance Statement Proteins bind each other in a highly specific and regulated manner, and these associated dynamics of binding are intimately linked to their function. Conventional techniques of structure determination such as cryo-EM, X-ray crystallography and NMR are time-consuming and arduous. Using a temperature-replica exchange Monte Carlo approach that mimics the kinetic mechanism of “induced fit” binding, we improved prediction of protein complex structures, particularly for targets that exhibit considerable conformational changes upon binding (Interface root mean square deviation (unbound-bound) > 1.2 Å. Capturing these binding-induced conformational changes in proteins can aid us in better understanding biological mechanisms and suggest intervention strategies for disease mechanisms.

Biochemistry

Biophysics

17

Paper

Save

Advancing Membrane-Associated Protein Docking with Improved Sampling and Scoring in Rosetta

Rituparna Samanta et al.Nov 22, 2024

The oligomerization of protein macromolecules on cell membranes plays a fundamental role in regulating cellular function. From modulating signal transduction to directing immune response, membrane proteins (MPs) play a crucial role in biological processes and are often the target of many pharmaceutical drugs. Despite their biological relevance, the challenges in experimental determination have hampered the structural availability of membrane proteins and their complexes. Computational docking provides a promising alternative to model membrane protein complex structures. Here, we present Rosetta-MPDock, a flexible transmembrane (TM) protein docking protocol that captures binding-induced conformational changes. Rosetta-MPDock samples large conformational ensembles of flexible monomers and docks them within an implicit membrane environment. We benchmarked this method on 29 TM-protein complexes of variable backbone flexibility. These complexes are classified based on the root-mean-square deviation between the unbound and bound states (RMSD

Molecular Biology

Cell Biology

0

Paper

Molecular Biology

Cell Biology

0

Save

Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks

Reliable protein-protein docking with AlphaFold, Rosetta, and replica-exchange

Flexible Protein-Protein Docking with a Multi-Track Iterative Transformer

Towards generalizable prediction of antibody thermostability using machine learning on sequence and structure features

Colicin-mediated transport of DNA through the iron transporter FepA

Induced fit with replica exchange improves protein complex structure prediction

Advancing Membrane-Associated Protein Docking with Improved Sampling and Scoring in Rosetta

Scan to connect with one of our mobile apps

Coinbase Wallet app

Coinbase app

Or try the Coinbase Wallet browser extension