New computational model allows more accurate prediction of antibody structure
By adapting artificial intelligence models known as large-scale language models, researchers have made significant advances in their ability to predict structure from protein sequences. However, this approach has been less successful for antibodies. One reason for this is the hypervariability found in this type of protein.
To overcome this limitation, researchers at MIT have developed computational techniques that allow large-scale language models to more accurately predict antibody structures. Their work could allow researchers to sift through millions of potential antibodies to identify those that can be used to treat SARS-CoV-2 and other infectious diseases.
The research results will be published in the Proceedings of the National Academy of Sciences.
“Our method can scale to the point where you can actually find a few needles in a haystack, which other methods can’t do,” said the director of the MIT Computational and Biology Group. said Bonnie Berger, Simmons Professor of Mathematics. researcher at the Science and Artificial Intelligence Laboratory (CSAIL) and one of the senior authors of this new study. “If we can stop drug companies from running clinical trials with the wrong stuff, we could really save a lot of money.”
The technology focuses on modeling the hypervariable regions of antibodies and also has the potential to analyze an individual’s entire antibody repertoire. This could help study the immune responses of people who are super responders to diseases such as HIV and understand why their antibodies protect against the virus so effectively.
Brian Bryson, an associate professor of bioengineering at MIT and a member of the Ragon Institute at MGH, MIT, and Harvard University, is also a senior author on the paper. Rohit Singh, a former CSAIL research scientist and current assistant professor of biostatistics, bioinformatics, and cell biology at Duke University, and recent graduate Chiho Im are the paper’s first authors. Researchers from Sanofi and ETH Zurich also contributed to the study.
Modeling hypervariability
Proteins are made up of long chains of amino acids that can be folded into a vast number of possible structures. In recent years, predicting these structures has become much easier using artificial intelligence programs such as AlphaFold.
Many of these programs, such as ESMFold and OmegaFold, are based on large language models. These models were originally developed to analyze large amounts of text, allowing them to learn to predict the next word in a sequence. This same approach can be applied to protein sequences by learning which protein structures are most likely to form from different amino acid patterns.
However, this technique does not always work for antibodies, especially the segments of antibodies known as hypervariable regions. Antibodies typically have a Y-shaped structure, and these hypervariable regions are located at the tip of the Y to detect and bind to foreign proteins, also known as antigens. The bottom of the Y provides structural support and helps antibodies interact with immune cells.
Hypervariable regions vary in length but typically contain fewer than 40 amino acids. It is estimated that the human immune system can produce up to 1 quintillion different antibodies by changing the sequence of these amino acids, helping to ensure that the body can respond to a wide variety of potential antigens. has been. These sequences are not evolutionarily constrained in the same way as other protein sequences, making it difficult for large language models to learn how to accurately predict their structure.
“Part of the reason why language models are able to predict protein structures so well is because evolution places constraints on these sequences and allows the model to decipher what those constraints mean,” Singh said. says. “It’s similar to learning the rules of grammar by looking at the context of a word in a sentence and understanding what it means.”
To model these hypervariable regions, the researchers created two modules built on existing protein language models. One of these modules is trained on hypervariable sequences from approximately 3,000 antibody structures in the Protein Data Bank (PDB), allowing it to learn which sequences tend to produce similar structures. I did. Another module was trained on data correlating approximately 3,700 antibody sequences with their binding strength to three different antigens.
A computational model known as AbMap can predict the structure and binding strength of antibodies based on their amino acid sequences. To demonstrate the utility of this model, the researchers used it to predict the structure of an antibody that strongly neutralizes the spike protein of the SARS-CoV-2 virus.
The researchers started with a series of antibodies predicted to bind to this target and then generated millions of variants by changing the hypervariable regions. Their model was able to identify the most successful antibody structures much more accurately than traditional protein structure models based on large-scale language models.
The researchers then took the additional step of clustering the antibodies into groups with similar structures. They worked with Sanofi researchers to select antibodies from each of these clusters to test experimentally. In these experiments, we found that 82% of these antibodies had better binding strength than the original antibody used in the model.
Identifying a variety of good candidates early in the development process could help drug companies avoid spending large sums of money testing candidates that later fail, the researchers said.
“They don’t want to put all their eggs in one basket,” Singh says. “They don’t want to say, “I’m going to do preclinical testing with this one antibody and it’s going to turn out to be toxic.” We want to let the antibodies pass, so if something goes wrong, we have some options.”
Antibody comparison
Using this technology, researchers could also try to answer the long-standing question of why different people respond differently to infectious diseases. For example, why do some people develop more severe forms of COVID-19, and why do some people not get infected when exposed to HIV?
Scientists have attempted to answer these questions by performing single-cell RNA sequencing of individuals’ immune cells and comparing them. This process is known as antibody repertoire analysis. Previous studies have shown that the antibody repertoires of two different people may overlap by as little as 10%. However, because two antibodies with different sequences can have similar structures and functions, sequencing does not provide as comprehensive a picture of an antibody’s performance as structural information.
The new model helps solve this problem by rapidly generating the structures of all antibodies found within an individual. In this study, the researchers showed that when structure is taken into account, there is much more overlap between individuals than the 10% found in sequence comparisons. They now plan to further investigate how these structures contribute to the body’s overall immune response to specific pathogens.
“Language models are a great fit here because they approach the accuracy of structure-based analysis while having the scalability of sequence-based analysis,” says Singh.
Further information: Rohit Singh et al. Learning the language of antibody hypervariability, Proceedings of the National Academy of Sciences (2024). DOI: 10.1073/pnas.2418918121
Provided by Massachusetts Institute of Technology
This article is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site covering news about MIT research, innovation, and education.
Citation: New computational model can more accurately predict antibody structures (January 2, 2025) from https://phys.org/news/2025-01-antibody-accurately.html on January 2, 2025 acquisition
This document is subject to copyright. No part may be reproduced without written permission, except in fair dealing for personal study or research purposes. Content is provided for informational purposes only.