Evolutionary scale, PBC AI Guided by a multimodal generative language model called ESM3Thomas Hayes and his team of researchers at EvolutionaryScale (a United States entity specialized in artificial intelligence) generated and synthesized a bright fluorescent protein previously unknown, with a genetic sequence so different from known fluorescent proteins that researchers say its creation is equivalent to ESM3 simulating 500 million years of biological evolution.
As published in Science, the model could provide a new way to “search” the space of protein possibilities with a view to better understand how naturally evolved proteins workas well as developing new proteins for uses in medicine, environmental remediation and a host of other applications.
ESM3 can reason about the sequence, structure and function of proteinsby representing each of them through alphabets of discrete tokens that can be combined in a generative language model. This strategy differs from previous uses of language models that only scaled to protein sequences.
The training data for ESM3 consists of 771 billion unique tokens created from 3.15 billion protein sequences, 236 million protein structures, and 539 million proteins with function annotations. ESM3 can train up to 98 billion parameters.
ESM3 is already available in public beta via an APIallowing scientists to design proteins programmatically or through interactive browser-based applications. Researchers can use the EvolutionaryScale Forge API through the free academic access tier or use the open model code and weights.
#Artificial #Intelligence #model #simulates #million #years #evolution #design #proteins #traveling #time