OpenAI’s ChatGPT and the natural language processing algorithms used in Google search promise to shorten the time it takes to bring drugs to market. The bots that chat with you and the AI used in search queries are fueling the development of new drugs in a similar way. In an interview, Ali Madani, founder of ProFluent Bio, a startup that focuses on language-based protein design, said, “Nature has provided us with a large number of examples of proteins that have been carefully engineered to have multiple functions. We are learning these models from nature.”
In the search for new disease-fighting drugs, drugmakers have always gone through tough iterations to identify the right compounds. But what if AI could predict the composition of new drug molecules – the way Google understands what you’re trying to search for, or an email program predicts your replies (e.g., “Got it, thanks”)?
Currently, researchers are aiming to do just that with the help of a new method that uses artificial intelligence techniques for natural language processing to analyze and synthesize proteins, the basic units of life and many medicines, and which OpenAI’s ChatGPT relies on to produce human-like responses. The method capitalizes on what biological code has in common with search queries and email text: they are both represented by a series of letters. Proteins are made
up of dozens to thousands of small chemical subunits called amino acids, and scientists use special symbols to keep track of their sequences. Each amino acid corresponds to a letter in the alphabet, and proteins are represented as longer combinations of letters, similar to sentences. Natural language algorithms, which can quickly analyze language and predict the next step in a conversation, can also be used with this biological data to create protein language models. These models encode protein grammars to predict the sequences of letters that might make up a new drug molecule; protein grammars refer to the rules that indicate which combinations of amino acids produce specific therapeutic properties. As a result, the time required in the early stages of drug development could be reduced from years to months.
ProFluent Bio is a Berkeley, California-based startup focused on language-based protein design. Ali Madani, the company’s founder, says, “Nature has provided us with a vast array of examples of proteins that have been carefully designed to serve multiple functions.” Madani adds, “We are learning these models from nature.”
Protein-based drugs are used to treat diseases such as heart disease, certain cancers and HIV. In the past two years, Merck, Roche’s Genentech, and a number of startups, such as Huashenzhi Pharmaceuticals and Round One Intelligence, have begun developing new drugs with the help of natural language processing. They hope this approach will not only improve the effectiveness of existing drugs and drug candidates, but also open the door to new molecules that could treat diseases such as pancreatic cancer or amyotrophic lateral sclerosis (ALS). More effective drugs for these diseases remain elusive.
“These kinds of technologies will begin to address the ‘drug-free’ areas of biology,” says Sean McClain, founder and CEO of Absci, a drug discovery company based in Vancouver, Washington. According to computational biologists, natural language processing for drug discovery still faces significant difficulties. They argue that too many modifications to existing protein-based drugs could have unintended side effects, and that fully synthetic molecules would need to undergo rigorous testing to ensure they are safe for humans. But if natural language algorithms really do work as their adopters hope, they will inject a whole new level of power into AI to transform drug development. We used to struggle with AI once upon a time due to technological limitations or lack of data. Recent advances in natural language processing and the dramatic drop in the cost of protein sequencing have gone a long way toward solving both of these problems, proponents say; protein sequencing has also yielded huge databases of amino acid sequences. Because the technology is still in its early stages, major companies are currently focused on using protein language models to enhance known molecular actions, such as improving the efficacy of drug candidates. For example, using a naturally occurring monoclonal antibody as a starting point, these models can suggest tweaks to its amino acid sequence to improve its therapeutic efficacy.
In a preprint paper published online in August, Absci researchers used this approach to enhance the antibody-based anticancer drug trastuzumab so that it binds more tightly to a target on the surface of cancer cells. Tighter binding could mean patients use fewer doses, shorter drug regimens and fewer side effects. In another paper published in March in the Proceedings of the National Academy of Sciences, researchers at MIT, Tsinghua University, and Beijing-based Huazhen Wisdom Pharmaceuticals used a protein language model to modify a Covid-19 drug candidate, which is effective only against alpha, beta, and gamma variants, so that it could also treat delta variants. Round One Intelligence is a startup with operations in the US and China. The company aims to help clients use such models to translate common starting points for drug development – namely, animal proteins, such as antibodies from rabbits – into forms that match human physiology, said founder and chief executive officer Luyong Pan. But drugmakers are also shifting their sights from known protein modifications to ab initio sequencing, where molecules are synthesized from scratch.
Genentech says a recent experiment shows that we can design an antibody that binds to the same cellular target as patulizumab, a breast cancer drug marketed under the brand name Perjeta, but with an entirely new amino acid sequence.Richard Bonneau is executive director of Genentech. Richard Bonneau is executive director of Genentech, which he joined last year when Genentech acquired Prescient Design, the company he founded. He says, “The company’s scientists provide only the three-dimensional shapes needed for its protein language models for targets and antibodies (which are major determinants of protein function).”
Absci and Huashenzi Pharma also work with pharmaceutical manufacturers to develop drugs for cancer and autoimmune diseases using ab initio sequencing methods. Absci announced a partnership with Merck in January to study three drug targets, according to McClain. A Merck spokeswoman said the company has undertaken several collaborations to explore the potential of AI in drug development. Huashen Smart Drugs CEO and founder Ken Peng said the company signed up with two major pharmaceutical companies last month with the intention of treating diseases for which there were no previously relevant drugs.
RoundOne Intelligence’s Lurong Pan said, “For a long time, all the challenges in drug development have been stopped in their tracks and have been waiting for a new wave of technology to successfully solve them. This is truly a disruptive approach.” Many computational biologists expect protein language modeling to yield benefits beyond speeding up the drug development process. Biologists say the same technology might be used to produce better enzymes for environmental applications such as degrading plastics, treating wastewater, and cleaning up oil spills.ProFluent Bio’s Dr. Madani says, “Protein is the foundation of life. It allows us to breathe freely and see everything; it sustains the environment; and because of it, humans get sick, but may also heal. If we can work together to design better or entirely new tools, then proteins could have a very wide range of applications.”