Mathematical innovation turns blood draw into information gold mine in study

- By Bruce Goldman

Atul Butte

Atul Butte

Scientists at the Stanford University School of Medicine have devised a software algorithm that could enable a common laboratory device to virtually separate a whole-blood sample into its different cell types and detect medically important gene-activity changes specific to any one of those cell types.

In a study published online March 7 in Nature Methods, the scientists reported that they had successfully used the new technique to pinpoint changes in one cell type that flagged the likelihood of kidney-transplant recipients rejecting their new organs. Without the software, these gene-activity flags would have gone unnoticed. The authors believe that the use of the new algorithm may have applications beyond kidney rejection, allowing doctors to better identify the onset of cancers, genetic disorders and a variety of other problems.

The lab device, called a microarray, is a standard research tool. But until the development of this algorithm, scientists and physicians have not been able to use it to derive such medically useful information from whole-blood samples. Part of the problem is that the information is obscured by the whole-blood samples’ complex, multiple-component composition.

“Drawing blood is one of the most common diagnostic tests in clinical practice,” said one of the investigators, Atul Butte, MD, PhD, assistant professor of pediatrics and of medical informatics. “We’d love to be able to use microarrays to find changes in the blood that indicate trouble somewhere in the body. But distinguishing one type of cell from another can be critical to doing that.”

Butte is a senior author of the paper, along with Mark Davis, PhD, director of the Stanford Institute for Immunity, Transplantation and Infection. The two lead authors are postdoctoral scholar Shai Shen-Orr, PhD, and Robert Tibshirani, PhD, professor of health research and policy and of statistics.

The potential for extracting important information from a blood sample has mushroomed since the advent of the microarray about 15 years ago. A microarray is a man-made, thumbnail-sized grid of DNA on whose surface reside tens of thousands of tiny sensors that can distinguish among different short sequences of nucleic acids — the genetic material of all life. Such a chip can be immersed in an extract from living cells, such as blood; then, whenever a sensor on the chip detects a matching nucleic-acid sequence, it transmits a fluorescent signal recording the sequence’s presence.

By using microarrays to measure how actively a gene is being “expressed,” research scientists can detect medically important alterations in a tissue. As they get steadily cheaper and easier to work with, microarrays are also at the threshold of widespread use as clinical diagnostic devices.

Still, whole blood poses a complication when used as a sample in microarray analyses. “Any 7-year-old can look at a blood sample under a microscope and see it’s a mix of a huge number of different kinds of cells,” said Butte, who is also director of the Center for Pediatric Bioinformatics at Lucile Packard Children’s Hospital. A single sample contains dozens of cell types, at different levels of maturity or at different stages of activation. A gene-expression change that, in one cell type, means something has gone terribly wrong may in another cell type be completely benign, or even a sign of needed activation. But a microarray has no way of knowing which kind of cell in the mix a particular nucleic-acid snippet came from.

Mark Davis

Mark Davis

To make things more difficult, the composition of samples drawn from two different patients — or even of two samples drawn at different times from the same patient — varies dramatically.

Imagine that a public-opinion analyst, new on the job, were to conduct two national voter-preference surveys before and after a politician’s speech, to see if that speech improved or impaired the popularity of a piece of legislation. But the rookie analyst has neglected to ask those surveyed which party they lean toward or what state they come from, so doesn’t realize the first survey sample had a Democrat-to-Republican ratio of 30:70, while in the second, the ratio was reversed. The analyst might mistakenly infer a huge swing in pre- and post-speech preferences, when in fact the only real change was in the samples’ compositions. Meanwhile, a vehement change in support among residents of a small but election-swinging state might go undetected.

In the same way, comparing a gene-expression pattern based on one person’s whole-blood sample to another person’s, or even the same person’s blood over time, isn’t very informative with a typical microarray run. Medically significant changes in gene-expression patterns can go unnoticed in those tests, while those that reflect changes in the composition of the sample may trigger false alarms.

While ways of separating whole blood into its constituent cell types do exist, these methods are too tedious, time-consuming and costly for routine clinical diagnostics and, for similar reasons, pose a challenge for research on large groups of subjects.

So the investigators devised an algorithm — in this case, a very large number of fairly simple equations. They believed that the simultaneous solution for all these equations enabled the assigning of gene-expression changes to particular cell types in patients’ blood samples.

To test their algorithm’s accuracy, the researchers obtained whole blood samples from 24 pediatric kidney-transplant patients. Fifteen of the 24 patients were experiencing symptoms of acute transplant rejection, while nine were in stable condition.

Because complete blood counts had been routinely performed on these patients, the frequencies within each sample of five important blood-cell types — monocytes, lymphocytes, neutrophils, basophils and eosinophils — were known.

Analyzing patients’ whole blood samples via microarrays without resorting to the new algorithm, the investigators couldn’t distinguish any gene-expression pattern differences between the two patient groups. But when they used the new algorithm, they found hundreds of differences in gene expression. Those differences could be used to tell which patients were rejecting their transplants and which were not. Of equal importance, this method let the researchers see that these changes were largely confined to one particular cell type: the monocytes. Only the new virtual-separation technique made fingering this cellular culprit possible.

“It was like a giant arrow pointing to the biological source of the rejection problem,” said Davis, the Burt and Marion Avery Family Professor of Immunology and a Howard Hughes Medical Institute investigator.

Other Stanford co-authors were Dale Bodian, PhD; Trevor Hastie, PhD; Purvesh Khatri, PhD; Nicholas Perry; and Minnie Sarwal, MD, PhD. None of the co-authors has any financial stake in the new software technology. They intend to distribute it to the academic and nonprofit investigator communities free of charge and, perhaps, to license it to for-profit companies in order to speed its dissemination.

The study was supported by the National Institute of Allergy and Infectious Diseases, the National Heart Lung, and Blood Institute and the National Cancer Institute, all arms of the National Institutes of Health.

About Stanford Medicine

Stanford Medicine is an integrated academic health system comprising the Stanford School of Medicine and adult and pediatric health care delivery systems. Together, they harness the full potential of biomedicine through collaborative research, education and clinical care for patients. For more information, please visit med.stanford.edu.

2023 ISSUE 3

Exploring ways AI is applied to health care