ResearchHub | Open Science Community

The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design

Rebecca Alford et al.Apr 21, 2017

Over the past decade, the Rosetta biomolecular modeling suite has informed diverse biological questions and engineering challenges ranging from interpretation of low-resolution structural data to design of nanomaterials, protein therapeutics, and vaccines. Central to Rosetta's success is the energy function: a model parametrized from small-molecule and X-ray crystal structure data used to approximate the energy associated with each biomolecule conformation. This paper describes the mathematical models and physical concepts that underlie the latest Rosetta energy function, called the Rosetta Energy Function 2015 (REF15). Applying these concepts, we explain how to use Rosetta energies to identify and analyze the features of biomolecular models. Finally, we discuss the latest advances in the energy function that extend its capabilities from soluble proteins to also include membrane proteins, peptides containing noncanonical amino acids, small molecules, carbohydrates, nucleic acids, and other macromolecules.

Artificial Intelligence

Biochemistry

1

Paper

Artificial Intelligence

1,175

0

Save

4

Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks

Julia Leman et al.Apr 5, 2021

Abstract Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours.

Software

Information Systems And Management

4

Paper

Save

Novel sampling strategies and a coarse-grained score function for docking homomers, flexible heteromers, and oligosaccharides using Rosetta in CAPRI Rounds 37–45

Shourya Burman et al.Aug 30, 2019

CAPRI Rounds 37 through 45 introduced larger complexes, new macromolecules, and multi-stage assemblies. For these rounds, we used and expanded docking methods in Rosetta to model 23 target complexes. We successfully predicted 14 target complexes and recognized and refined near-native models generated by other groups for two further targets. Notably, for targets T110 and T136, we achieved the closest prediction of any CAPRI participant. We created several innovative approaches during these rounds. Since Round 39 (target 122), we have used the new RosettaDock 4.0, which has a revamped coarse-grained energy function and the ability to perform conformer selection during docking with hundreds of pre-generated protein backbones. Ten of the complexes had some degree of symmetry in their interactions, so we tested Rosetta SymDock, realized its shortcomings, and developed the next-generation symmetric docking protocol, SymDock2, which includes docking of multiple backbones and induced-fit refinement. Since the last CAPRI assessment, we also developed methods for modeling and designing carbohydrates in Rosetta, and we used them to successfully model oligosaccharide–protein complexes in Round 41. While the results were broadly encouraging, they also highlighted the pressing need to invest in (1) flexible docking algorithms with the ability to model loop and linker motions and in (2) new sampling and scoring methods for oligosaccharide–protein interactions.

Biophysics

Pharmacology

0

Paper

Save

Repertoire analysis of antibody CDR-H3 loops suggests affinity maturation does not typically result in rigidification

Jeliazko Jeliazkov et al.Dec 8, 2017

Antibodies can rapidly evolve in specific response to antigens. Affinity maturation drives this evolution through cycles of mutation and selection leading to enhanced antibody specificity and affinity. Elucidating the biophysical mechanisms that underlie affinity maturation is fundamental to understanding B-cell immunity. An emergent hypothesis is that affinity maturation reduces the conformational flexibility of the antibody's antigen-binding paratope to minimize entropic losses incurred upon binding. In recent years, computational and experimental approaches have tested this hypothesis on a small number of antibodies, often observing a decrease in the flexibility of the Complementarity Determining Region (CDR) loops that typically comprise the paratope and in particular the CDR-H3 loop, which contributes a plurality of antigen contacts. However, there were a few exceptions, and previous studies were limited to a small handful of cases. Here, we determined the structural flexibility of the CDR-H3 loop for thousands of recently-determined homology models of the human peripheral blood cell antibody repertoire using rigidity theory. We found no clear delineation in the flexibility of naïve and antigen-experienced antibodies. To account for possible sources of error, we additionally analyzed hundreds of human and mouse antibodies in the Protein Data Bank through both rigidity theory and B-factor analysis. By both metrics, we observed only a slight decrease in the CDR-H3 loop flexibility when comparing affinity-matured antibodies to naïve antibodies, and the decrease was not as drastic as previously reported. Further analysis, incorporating molecular dynamics (MD) simulations, revealed a spectrum of changes in flexibility. Our results suggest that rigidification may be just one of many biophysical mechanisms for increasing affinity.

Genetics

Biophysics

0

Paper

Save

Robustification of RosettaAntibody and Rosetta SnugDock

Jeliazko Jeliazkov et al.May 26, 2020

J

R

J

Abstract In recent years, the observed antibody sequence space has grown exponentially due to advances in high-throughput sequencing of immune receptors. The rise in sequences has not been mirrored by a rise in structures, as experimental structure determination techniques have remained low-throughput. Computational modeling, however, has the potential to close the sequence–structure gap. To achieve this goal, computational methods must be robust, fast, easy to use, and accurate. Here we report on the latest advances made in RosettaAntibody and Rosetta SnugDock—methods for antibody structure prediction and antibody–antigen docking. We simplified the user interface, expanded and automated the template database, generalized the kinematics of antibody–antigen docking (which enabled modeling of single-domain antibodies) and incorporated new loop modeling techniques. To evaluate the effects of our updates on modeling accuracy, we developed rigorous tests under a new scientific benchmarking framework within Rosetta. Benchmarking revealed that more structurally similar templates could be identified in the updated database and that SnugDock broadened its applicability without losing accuracy. However, there are further advances to be made, including increasing the accuracy and speed of CDR-H3 loop modeling, before computational approaches can accurately model any antibody.

Artificial Intelligence

Biochemistry

0

Paper

Artificial Intelligence

Biochemistry

0

Save

0

Toward computational design of protein crystals with improved resolution

Jeliazko Jeliazkov et al.Jun 2, 2019

Substantial advances have been made in the computational design of protein interfaces over the last 20 years. However, the interfaces targeted by design have typically been stable and high affinity. Here, we report the development of a generic computational design method to stabilize the weak interactions at crystallographic interfaces. Initially, we analyzed structures reported in the Protein Data Bank (PDB) to determine whether crystals with more stable interfaces result in higher resolution structures. We found that, for twenty-two variants of a single protein crystallized by a single individual, Rosetta score correlates with resolution. We next developed and tested a computational design protocol, seeking to identify point mutations that would improve resolution on a highly stable variant of staphylococcal nuclease (SNase Δ+PHS). Only one of eleven initial designs crystallized, forcing us to re-evaluate our strategy and base our designs on an ensemble of protein backbones. Using this strategy, four of the five designed proteins crystallized. Collecting diffraction data for multiple crystals per design and solving crystal structures, we found that designed crystals improved resolution modestly and in unpredictable ways, including altering crystal space group. Post-hoc, in silico analysis showed that crystal space groups could have been predicted for four of six variants (including WT), but that resolution did not correlate with interface stability, as it did in the preliminary results. Our results show that single point mutations can have significant effects on crystal resolution and space group, and that it is possible to computationally identify such mutations, suggesting a potential design strategy to generate high-resolution protein crystals from poorly diffracting ones.

Artificial Intelligence

Biochemistry

0

Paper

Artificial Intelligence

Biochemistry

0

Save

1

ESMFold Hallucinates Native-Like Protein Sequences

Jeliazko Jeliazkov et al.May 24, 2023

J

D

J

Abstract We describe attempts to design protein sequences by inverting the protein structure prediction algorithm ESMFold. State-of-the-art protein structure prediction methods achieve high accuracy by relying on evolutionary patterns derived from either multiple sequence alignments (AlphaFold, RosettaFold) or pretrained protein language models (PLMs; ESMFold, OmegaFold). In principle, by inverting these networks, protein sequences can be designed to fulfill one or more design objectives, such as high prediction confidence, predicted protein binding, or other geometric constraints that can be expressed with loss functions. In practice, sequences designed using an inverted AlphaFold model, termed AFDesign, contain unnatural sequence profiles shown to express poorly, whereas an inverted RosettaFold network has been shown to be sensitive to adversarial sequences. Here, we demonstrate that these limitations do not extend to neural networks that include PLMs, such as ESMFold. Using an inverted ESMFold model, termed ESM-Design, we generated sequences with profiles that are both more native-like and more likely to express than sequences generated using AFDesign, but less likely to express than sequences rescued by the structure-based design method ProteinMPNN. However, the safeguard offered by the PLM came with steep increases in memory consumption, preventing proteins greater than 150 residues from being modeled on a single GPU with 80GB VRAM. During this investigation, we also observed the role played by different sequence initialization schemes, with random sampling of discrete amino acids improving convergence and model quality over any continuous random initialization method. Finally, we showed how this approach can be used to introduce sequence and structure diversification in small proteins such as ubiquitin, while respecting the sequence conservation of active site residues. Our results highlight the effects of architectural differences between structure prediction networks on zero-shot protein design.

Genetics

Artificial Intelligence

1

Paper

Genetics

Artificial Intelligence

0

Save

0

Affinity-engineered human antibodies detect celiac disease gluten pMHC complexes and inhibit T-cell activation

Rahel Frick et al.Nov 15, 2019

Antibodies specific for antigenic peptides bound to major histocompatibility complex (MHC) molecules are valuable tools for studies of antigen presentation. Such T-cell receptor (TCR)-like antibodies may also have therapeutic potential in human disease due to their ability to target disease-associated antigens with high specificity. We previously generated celiac disease (CeD) relevant TCR-like antibodies that recognize the prevalent gluten epitope DQ2.5-glia-α1a in complex with HLA-DQ2.5. Here, we report on second-generation high-affinity antibodies towards this epitope as well as a panel of novel TCR-like antibodies to another immunodominant gliadin epitope, DQ2.5-glia-α2. The strategy for affinity engineering was based on Rosetta modeling combined with pIX phage display and is applicable to similar protein engineering efforts. We isolated picomolar affinity binders and validated them in Fab and IgG format. Flow cytometry experiments with CeD biopsy material confirm the unique disease specificity of these TCR-like antibodies and reinforce the notion that B cells and plasma cells have a dominant role in gluten antigen presentation in the inflamed CeD gut. Further, the lead candidate 3.C11 potently inhibited CD4+ T-cell activation and proliferation in vitro in an HLA and epitope specific manner, pointing to a potential for targeted disease interception without compromising systemic immunity.

Biochemistry

Immunology

0

Paper

Save

The Rosetta all-atom energy function for macromolecular modeling and design

Rebecca Alford et al.Feb 7, 2017

Over the past decade, the Rosetta biomolecular modeling suite has informed diverse biological questions and engineering challenges ranging from interpretation of low-resolution structural data to design of nanomaterials, protein therapeutics, and vaccines. Central to Rosetta's success is the energy function: a model parameterized from small molecule and X-ray crystal structure data used to approximate the energy associated with each biomolecule conformation. This paper describes the mathematical models and physical concepts that underlie the latest Rosetta energy function, Aasgard2017. Applying these concepts, we explain how to use Rosetta energies to identify and analyze the features of biomolecular models. Finally, we discuss the latest advances in the energy function that extend capabilities from soluble proteins to also include membrane proteins, peptides containing non-canonical amino acids, carbohydrates, nucleic acids, and other macromolecules.

Biochemistry

Biophysics

0

Paper

Save

Modeling and docking antibody structures with Rosetta

Brian Weitzner et al.Aug 16, 2016

We describe Rosetta-based computational protocols for predicting the three-dimensional structure of an antibody from sequence and then docking the antibody--protein-antigen complexes. Antibody modeling leverages canonical loop conformations to graft large segments from experimentally-determined structures as well as (1) energetic calculations to minimize loops, (2) docking methodology to refine the VL--VH relative orientation, and (3) de novo prediction of the elusive complementarity determining region (CDR) H3 loop. To alleviate model uncertainty, antibody--antigen docking resamples CDR loop conformations and can use multiple models to represent an ensemble of conformations for the antibody, the antigen or both. These protocols can be run fully-automated via the ROSIE web server or manually on a computer with user control of individual steps. For best results, the protocol requires roughly 2,500 CPU-hours for antibody modeling and 250 CPU-hours for antibody--antigen docking. Tasks can be completed in under a day by using public supercomputers.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save