What's new

Welcome to sihec | Welcome My Forum

Join us now to get access to all our features. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, and so, so much more. It's also quick and totally free, so what are you waiting for?

The Cross-Species Cell Atlas: Chan Zuckerberg Initiative Releases TranscriptFormer AI Model

Hoca

Administrator
Staff member
Joined
Apr 6, 2025
Messages
208
Reaction score
0
Points
0
“Hey model, if I prompt you with the marker genes for a cell type, can you complete the transcription factors that you believe would be highly expressed with expression of these genes?” posed Theofanis Karaletsos, head of artificial intelligence (AI) at the Chan Zuckerberg Initiative (CZI), as he imagined a future where scientists no longer waded through mountains of literature, data, and experiments to ask a very pointed biological question.

Rather, Karaletsos sees a transition point where AI models interrogate patterns from big data to produce meaningful biological insights at will, the difference between querying ChatGPT and perusing through hundreds of library books.


In a step toward that vision for single-cell transcriptomic data, Karaletsos and colleagues at CZI have released TranscriptFormer, a generative, multi-species model for probing cellular biology across organisms.

Posted as a preprint on bioRxiv that is not yet peer reviewed, the authors showed that TranscriptFormer can be prompted to predict cell-type-specific transcription factors and gene-gene interactions that aligned with independent experimental observations.

TranscriptFormer is trained on single-cell transcriptomics data from over 110 million cells from 12 different species, thereby covering 1.5 billion years of evolution. The model’s additional broad capabilities include disease state identification, comparative biology, encoding multi-level biological structure, and more.


“If you think about the cell atlas field, people have been churning out data for the past 10 years, but no one’s figured out how to put it together into a single reference,” said Stephen Quake, PhD, head of science at CZI and co-corresponding author, in an interview with GEN. “We view TranscriptFormer as a way to do the equivalent of genome assembly for all the cell atlas data.”

Quake states TranscriptFormer will provide strong applications in designing engineered cell states for synthetic biology and cellular therapies, in addition to providing evolutionary insights to understand relationships between species.

25_0429-Graphic-2-UPDATED-1024x576.jpg

The TranscriptFormer family of generative models is trained on cell atlases from species across evolution and development, generating outputs used for a variety of downstream tasks. [Chan Zuckerberg Initiative]

TranscriptFormer is a step forward for CZI’s virtual cell program, one of four scientific grand challenges that the nonprofit set earlier this April in its effort to transform human health at the intersection of AI and biology. The remaining challenges include developing imaging technologies to map complex biological systems, creating new tools for measuring inflammation in tissues in real time, and harnessing the immune system for early detection, prevention, and treatment of disease.

In addition, CZI is making the bet that “you’re going to need more data” while being strategic about the choice of data to generate, according to Quake. The release of TranscriptFormer follows CZI’s February announcement of the Billion Cells Project, a collaboration with 10X Genomics and Ultima Genomics to generate an unprecedented one billion cell dataset to fuel rapid progress in AI model development in biology.

CZI is not the only entity in the growing virtual cell space. Earlier this week, the Arc Institute announced efforts to grow the Arc Virtual Cell Atlas. The Palo Alto-based research institute has also been known to make large bets in big data-driven AI. In February, Arc’s genome foundation model, Evo 2, built in collaboration with Nvidia, made waves as the largest publicly available AI model for biology.


What is a cell type

According to the authors, TranscriptFormer demonstrates how expanding to broader evolutionary pre-training data can enhance a model’s ability to generalize across tasks and species.

“We didn’t imbue the model with knowledge about ‘what is a cell type’ or ‘what is a species.’ It sees millions of cells and learns emergent structures,” Karaletsos told GEN.

In cell type classification, TranscriptFormer successfully identified cell types from new species, not seen during training, and separated by over 685 million years of evolutionary distance. In disease state prediction, the model effectively separated diseased and healthy SARS-CoV-2-infected cells without the need for a COVID-19-specific cell dataset. TranscriptFormer was also able to learn emergent properties about multi-level biological structure and cluster cells based on different tissue contexts.

Additionally, Karaletsos highlighted that the ability to do cross-species analysis provides opportunities to elucidate how model organisms can translate to humans.

“We know mice teach us a lot about toxicity and Phase I clinical trials, but mechanistically we don’t have a model that predicts which things will transfer and which things won’t,” Karaletsos told GEN. “TranscriptFormer is a solid first step on that trajectory.”

In terms of whether we have arrived at the ChatGPT prompter for biological data, Karaletsos said TranscriptFormer has a long future of iterative growth. Looking ahead, the team will focus on expanding the model data to more diverse species and multiple modalities, such as proteomics and genomics, to further increase generalizable reach.

TranscriptFormer is publicly accessible on CZI’s virtual cells platform with code available on GitHub. In addition, a demo tutorial notebook on cross-species cell prediction has been released for the biological research community.

The post The Cross-Species Cell Atlas: Chan Zuckerberg Initiative Releases TranscriptFormer AI Model appeared first on GEN - Genetic Engineering and Biotechnology News.
 
Top Bottom