New Method for Scalable Protein Design Expands Discovery of Therapeutics
CPTx.bio, October 25th, 2024
An international consortium, including researchers from Technical University of Munich (origin of CPTx), Fudan University of Shanghai, Harvard University, and Massachusetts Institute of Technology (MIT), has published in Science Magazine a new method for scalable protein design called relaxed sequence optimization (RSO) (https://doi.org/10.1126/science.adq1741). The study draws on advancements in machine learning for protein structure prediction and design, building on the groundbreaking innovations of 2024 Nobel Prize laureates David Baker, Demis Hassabis, and John Jumper. The work on RSO introduces an efficient way to perform gradient descent-based protein design, allowing for the creation of large and complex proteins comprising up to 1,000 amino acids, offering potential applications in new vaccine and therapeutic programs.
The research was co-led by Hendrik Dietz, founder and CEO of CPTx, with Sergey Ovchinnikov (MIT), and arrives at a time when protein design is becoming an increasingly used tool in biotechnology, following significant advances in predictive modeling. Grant Boldt, COO of CPTx, explains: “At CPTx, we are very excited about the potential of scalable protein design. This innovative method allows exploring a wider range of functional protein sequences efficiently, accelerating research and delivering breakthrough treatments to patients faster.” The company sees potential for leveraging RSO for the development of composite programmable therapeutics, adding to its growing portfolio of biopharmaceutical assets aimed at addressing outsized medical needs.
Protein Design Pipeline Concept:
The RSO pipeline enables protein design in a relaxed sequence space, allowing for gradient-based optimization of protein structures. Unlike previous methods that forced sequence optimization into rigid, discrete sequence spaces (i.e., physically possible protein sequences), RSO allows for continuous optimization in a relaxed space where each residue can be represented as a probability distribution over the 20 possible amino acids. This leads to smoother transitions during optimization and avoids disruptions caused by rigid encoding.
- Relaxed Sequence Input: The pipeline begins by inputting a sequence into a structure prediction network (utilizing AlphaFold2) to predict its structure. A mathematical loss function is computed to measure how well this structure matches target objectives, such as compactness, stability, or number of contacts to a target receptor.
- Gradient-Based Optimization: The loss function is backpropagated through the network, and gradients are calculated to iteratively update the sequence toward the target properties. Unlike previous methods, the relaxed sequence is not immediately forced into a discrete state but remains in a weighted mixture of amino acids per residue.
Steps 1 and 2 were implemented in an environment called ColabDesign, initially developed by Sergey Ovchinnikov and his team at MIT, with contributions from the wider community.
- ProteinMPNN Module: Once the backbone of the protein design converges, a neural network called ProteinMPNN, trained by David Baker’s team at the University of Washington, is employed to generate realistic candidate protein sequences. ProteinMPNN was specifically designed to generate protein sequences that are more likely to be successfully produced in bacteria using recombinant protein expression methods.
- In-Silico Validation: The candidate sequences are run back through AlphaFold2 or ESMFold for structure prediction, and the results are compared to the target backbone geometry. Candidate sequences that fulfill specifications can then be advanced to experimental testing.
The pipeline’s relaxed optimization allows for rapid prototyping of protein backbones, which can be tuned for different applications, such as designing binders, functional scaffolds, or large proteins (currently up to 1,000 amino acids). By tweaking the loss function, researchers can guide the design toward specific features like stability, binding capacity, or secondary structure preferences.
Experimental Validation:
More than 100 proteins designed using RSO were experimentally purified and analyzed using techniques such as size-exclusion chromatography (SEC) and circular dichroism (CD) spectroscopy. The structures of a subset of five proteins, ranging from 200 to 1,000 amino acids, were also experimentally characterized using cryo-electron microscopy (cryo-EM) and X-ray crystallography, with results demonstrating excellent agreement between the designed and experimental structures.
The technology was thus validated both computationally (via in silico comparisons and benchmarking) and experimentally (through protein expression and structural analysis). For more information on the study, visit the journal link.
Scalable Protein Design Using Optimization in a Relaxed Sequence Space, Science, October 25th, 2024
Christopher Frank, Ali Khoshouei, Lara Fuss, Dominik Schiwietz, Dominik Putz, Lara Weber, Zhixuan Zhao, Motoyuki Hattori, Shihao Feng, Yosta de Stigter, Sergey Ovchinnikov, Hendrik Dietz
https://doi.org/10.1126/science.adq1741
Illustrations:
The accompanying image illustrates the in-silico backbone optimization of a protein using RSO (from the supplementary information of the paper). Protein binders can also be designed using this method.