CytoTRACE 2: AI modeling of individual cell potential creates a vast space for new discoveries in cancer and developmental biology
By Christopher Vaughan
October 27, 2025
In recent years, the Newman lab has been a foundry for advanced scientific tools. The analytical methods the lab has developed have empowered biological researchers around the world and let to exciting new discoveries. Among those tools, Associate Professor of Biomedical Data Science Aaron Newman, PhD, counts CytoTRACE as one of the most successful. “We showed that the number of genes that were expressed (used to make proteins) in a cell was directly correlated to the cell’s “stem-ness.” That is to say, that by simply measuring the number of genes expressed, researchers can tell whether a cell is closer in development to a stem cell that can produce other cells, or a fully mature cell that is destined to live out its life and die.
“One cool example of this approach facilitating new discoveries was when other researchers used CytoTRACE to look for cells that regenerate the mouse intestine and found a completely new stem cell population,” Newman says.
The results of CytoTRACE 1 were impressive, but Newman and his colleagues weren’t satisfied. “One major question was, ‘Can we do better using AI?’” They found that they could, and that the result, CytoTRACE 2, published Monday, October 27th, 2025, in the journal Nature Methods, allows researchers to go far beyond the original techniques. By creating an AI model that is trained on billions of biological data points, Newman and his colleagues have created a kind of model of cell activity, one that can find answers to questions that were virtually unanswerable before. “We can understand the developmental potential of cells in a new way,” Newman says.
Newman and his colleagues, including co-first authors Minji Kang, Gunsagar Gulati, MD, PhD, Erin Brown, and Zhen Qi, PhD, began by creating an AI architecture that they could apply in a way that overcomes the shortcomings of CytoTRACE 1, he says. “For instance, before this we couldn’t compare potency of cells across data sets. The most stem-like cell in one data set might be the least in another.” With CytoTRACE 2, they can.
They were also able to measure something much more profound and powerful. “CytoTRACE 2 we are also able to understand what drives the prediction of potency levels,” Newman says. “We are able to ask, ‘what are the gene programs that define those potency states?””
To create CytoTRACE 2, they first trained an AI engine with billions of measurements of gene activity in cells that had been previously validated as being in different developmental levels. They separated those cells into six categories of varying potency: totipotent (the fertilized egg), pluripotent (embryonic stem cells), multipotent (adult tissue stem cells), unipotent, etc. The result was a kind of atlas of gene activity, correlated with known potency levels. Further refinements allowed them to divide those six categories into 24 sub-categories of cell potency.
In doing so, they created a measure of absolute potency. Measurements of cell potency from one data set could be compared to measurements from a completely different data set and predict the potency of all the cells involved.
But the AI model goes much farther than that. By learning known gene activity of tens of thousands of genes across hundreds of cells that are known to be at certain points in development, the model is able to learn which genes tend to be active at the same time and which don’t. “The model is able to learn biology,” Newman says. “It is able learn molecular programs.”
That means that they can now screen for the genes that are involved in maintaining potency in cells or inhibit differentiation. Or look at a particular gene and predict what kinds of cells it will be active in. “In general, you can’t do this,” Newman says. “You can’t take major markers and predict if they will be active in a cell with a certain type of developmental state.” But with CytoTRACE 2, researchers can.
Going even further, they were able look for molecular pathways linked to multipotency, a state found in many adult stem cells, and found some startling associations. “We were surprised to find that the cholesterol metabolism pathway is associated with multipotency in cells that have nothing to do with cholesterol per se,” Newman says. “We also found that genes associated with fatty acid synthesis are, surprisingly, also associated with multipotency.”
Where CytoTRACE 1 can indicate the relative potency of any cell in relation to others, CytoTRACE 2 can give an indication of absolute potency of a cell and tell researchers what factors are important for that state, Newman says.
The researchers have just begun looking at how to apply CytoTRACE 2 to cancer research, and the results they get are promising. Even though the AI engine is trained on data from normal cells, analysis of cells involved in acute myeloid leukemia gives information that reflects known biology, the researchers say. When they analyze oligodendroglioma cells using CytoTRACE 2, they see stem-like cells with the highest potency, which is precisely what is expected. “These tests validate the answers that CytoTRACE 2 is giving us,” Newman says.
As the analyses provided by CytoTRACE 2 becomes further validated, researchers should be able to analyze less well-defined cancers to find important cell types and biochemical pathways. “Already we have used CytoTRACE 2 to identify cancer cell stages and marker genes at the single cell level for the first time,” Newman says. “We have also been able to associate them with therapy response and survival.”
Finding gene targets for human cancers should become much more efficient. With about 20,000 genes active in living cells, picking out the ones key for sustaining the cancer can be a challenge. “Traditionally, the approach has involved some element of guesswork, where scientist identify a few genes that might be of interest and test them in mice,” Newman says. “With CytoTRACE 2, you can go directly to the human data, identify cells that are higher in potency and identify molecules that are important to this state. It narrows the space you have to search and boosts the ability to find valuable drug targets to fight cancer.”
The capabilities provided by CytoTRACE 2 “are a big leap forward,” Newman says. “We can understand the developmental potential of cells in a new way, which has implications for understanding developmental biology as well as cancer and other medical issues.”