Meta AI unlocks hundreds of millions of proteins to aid drug discovery

Facebook parent company Meta Platforms Inc.

has created a tool to predict the structure of hundreds of millions of proteins using artificial intelligence. Researchers say so promises to deepen scientists’ understanding of biology and perhaps accelerate the discovery of new drugs.

Meta’s research arm, Meta AI, used the new AI-based computer program known as ESMFold to create a public database of 617 million predicted proteins. Proteins are the building blocks of life and of many medicines that are necessary for the functioning of tissues, organs and cells.

Protein-based drugs are used to treat heart disease, certain cancers, and HIV, among other things, and many pharmaceutical companies have started pursuing new drugs with artificial intelligence. Using AI to predict protein structures is expected to not only increase the effectiveness of existing drugs and drug candidates, but also help discover molecules that can treat diseases whose cures have remained elusive.

With ESMFold, Meta takes on another protein prediction computer model known as AlphaFold from DeepMind Technologies, a subsidiary of Google parent Alphabet Inc.

AlphaFold said last year that its database contains 214 million predicted proteins that could help accelerate drug discovery.

Meta says ESMFold is 60 times faster than AlphaFold, but less accurate. The ESMfold database is larger because it made predictions based on genetic sequences that had not been studied before.

Predicting a protein’s structure can help scientists understand its biological function, according to Alexander Rives, co-author of a study published Thursday in the journal Science and a research scientist at Meta AI. Meta had previously released the paper describing ESMFold in November 2022 on a preprint server.

“Often proteins with similar structures have similar biological functions,” said Dr. Rives. “And if you can have a really high-resolution structure, then you can start thinking about what the actual biochemical function of these proteins is.”

According to Meta, about a third of the proteins predicted by ESMFold can be done with great certainty.

The quest to predict protein structure and subsequently function has been ongoing for the past decade. Because proteins constantly fold and refold themselves before forming their final structure, determining protein structures has been difficult and expensive for scientists. Instead of using microscopes that can image protein structures at the atomic level, the new AI models learn to predict protein shapes in hours or days instead of months and years.

Meta-researchers generated the predictions using a form of AI known as a big language model that can predict text based on just a few letters or words. It’s the same technology that allows OpenAI’s ChatGPT to generate human responses.

SHARE YOUR THOUGHTS

How can ESMFold change the future of medicine? Join the conversation below.

The Meta scientists gave the ESMFold program a series of letters that represent the amino acids that make up a protein’s genetic code. The AI ​​model then learned how to fill in the sections in the sequence that were blank or hidden. Once it generated a full sequence, ESMFold was able to learn the relationship between known protein sequences and structures already well understood by scientists to predict the structures of new ones.

Meta-scientists say the strength of ESMFold is the speed with which it can predict protein structures, allowing researchers to search large genetic databases for potential applications in medicine, health, nutrition and the environment.

“It’s a big achievement, but it depends a lot on the previous work,” said Olexandr Isayev, a computer biologist at Carnegie Mellon University, who was not involved in the study.

A biotech executive says he prefers AlphaFold over ESMFold because of its accuracy. “The bottleneck isn’t computation, so faster isn’t better, but better is more accurate,” said Chris Bahl, chief scientific officer and co-founder from AI Proteins, a Boston-based startup that uses artificial intelligence tools to develop synthetic proteins.

Dr. Rives said ESMFold is already being used by several academic research groups and biotech companies.

The ESMFold model has been downloaded at a rate of about 250,000 times per month since its release in 2022, predicting 1,000 protein structures per hour, according to a Meta spokeswoman.

Since AlphaFold was first released in 2021, according to DeepMind, more than a million researchers and biologists in more than 190 countries have used the database to look at three million protein structures.

“From what we’re seeing right now, the accuracy isn’t quite there yet with protein language models like ESMFold, which are less accurate than models like AlphaFold,” said a spokeswoman for DeepMind. “However, we expect that in many cases there will be good predictions in the ESMfold database.”

AI prediction models from DeepMind and Meta each have their strengths and will lead to new discoveries, said Andrew Ferguson, cofounder of Chicago-based biotech Evozyne and an associate professor of molecular engineering at the University of Chicago.

“They’re complementary,” said Dr. Ferguson, who added that the Meta AI model was “a very elegant idea.”

Evozyne has entered into a partnership with technology company Nvidia Corp.

to develop its own language model that skips the structure of a protein and can predict its biological function. Evozyne then used this model to develop two proteins, according to an article posted to a preprint server in January.

Write to Eric Niiler at eric.niiler@wsj.com

Copyright ©2022 Dow Jones & Company, Inc. All rights reserved. 87990cbe856818d5eddac44c7b1cdeb8

Leave a Comment