AI’s Rapid Rise in Medicine Sparks a Call for Guardrails
Inside the JAMA Summit that calls for responsible oversight to govern AI in health care
October 21, 2025 - by Rebecca Handler
Artificial intelligence has entered medicine with astonishing speed. Once confined to research labs, algorithms now help diagnose disease, predict sepsis, process insurance claims, transcribe clinic visits, and even chat with patients about their mental health. In the excitement about AI's potential to improve healthcare, it can be easy to forget how untested many tools still are.
A new report from the Journal of the American Medical Association (JAMA) brings that reality into focus. Drawing on a Summit convened by the journal involving more than sixty experts, including Stanford Medicine faculty Tina Hernandez-Boussard, PhD, Nigam Shah, MBBS, PHD, and Michelle Mello, JD, PhD, the report defines what responsible AI in health care should look like, maps out how deeply AI has penetrated health care, and describes how little structure exists to ensure it’s safe, fair, and effective.
Tina Hernandez-Boussard, PhD
Nigam Shah, MBBS, PhD
Michelle Mello, JD, PhD
The authors’ message is clear: while AI’s integration into medicine is inevitable, its benefits are not. Without a coordinated system for evaluating, regulating, and monitoring these tools, the technology could deepen inequities as easily as close them.
From the Clinic to the Cloud
AI’s potential reach now spans every layer of health care. In hospitals, algorithms can scan X-rays for signs of pneumonia, monitor vital signs for sepsis, and generate physician notes in real time. At home, consumers use mobile apps that analyze heart rhythms or skin lesions, while chatbots promise instant advice on anxiety, diet, or sleep. Behind the scenes, hospitals can deploy AI to manage operating room schedules, predict supply shortages, and optimize billing.
Some of these systems have FDA clearance as medical devices. Most do not. And even among those that do, few have been tested for what truly matters: whether they make people healthier. “These tools can have important health effects, good or bad, but those effects are often not quantified,” the authors write, citing the lack of regulatory standards and the difficulty of running controlled studies in messy, real-world environments.
So what makes the research so difficult? There are multiple factors. One is that there’s little appetite or resources for the kind of drawn-out studies typically done for new technologies. Another is that the technology moves faster than the science can follow. A sepsis-prediction algorithm can be updated weekly. A generative AI model can change daily. By the time a traditional clinical trial finishes, the tool being tested may no longer exist in the same form.
The Oversight Gap
In contrast to pharmaceuticals, AI in medicine has no unified regulatory pathway. The FDA reviews some algorithms that diagnose or treat disease, but its jurisdiction stops short of administrative software, wellness apps, and many clinical decision-support systems.
That leaves vast areas of health care untouched by federal oversight. A chatbot offering mental health support, for example, can reach millions of users without proving its safety or accuracy. Overwhelmingly, the AI tools subjected to FDA clearance are imaging tools – for example, an algorithm that helps detect spots on a mammogram. “Even when the FDA does have authority,” the report notes, “clearance does not necessarily require demonstration of improved clinical outcomes.”
In the absence of regulation, the marketplace has become the de facto testing ground. Hospitals and consumers are learning through trial and error. The problem, as former FDA Commissioner Robert Califf warns in the report, is that “no health system in the United States is currently capable of validating an AI algorithm once it’s in use.”
A System Built for Drugs, Not Data
Regulatory systems in medicine, such as the FDA’s processes, clinical trial structures, and oversight rules, were designed for products like drugs or medical devices, which are stable and unchanging once approved. A pill or implant behaves the same way no matter who takes it or where it’s used, so regulators can evaluate it before it reaches the market.
AI tools are completely different. They are dynamic and context dependent. They learn from new data, adapt over time, and behave differently depending on how clinicians use them or to which populations they’re applied. That makes them moving targets for evaluation. The usual premarket testing model doesn’t work because an algorithm that’s “safe and effective” today could change tomorrow once it starts learning from new users or new data.
A sepsis alert might perform flawlessly in one hospital but fail in another because of differences in staffing or workflow. A wearable heart monitor might succeed with tech-savvy patients but miss early warning signs in older adults.
Traditional clinical trials aren’t well suited to such moving targets. They’re expensive, slow, and rigid. By the time results emerge, the code may have changed a dozen times. What’s needed, the authors say, are more flexible methods: embedded “platform” trials, rapid A/B testing, and continuous monitoring systems that can adapt alongside the technology.
The Workforce Under Pressure
AI is also redefining what it means to work in medicine. Tools that once seemed futuristic, like digital scribes that generate clinical notes or ultrasound machines that interpret their own images, are now routine. These innovations can extend care to underserved areas and ease administrative burdens. But they also shift responsibilities, raising questions about deskilling, job displacement, and liability.
The report cautions against assuming AI will relieve burnout or restore time for human connection. Automation can just as easily increase demands. “It’s a tall ask to expect an AI tool to fix burnout,” the authors write. Instead, they argue, AI should be used to reshape systems around clinicians’ needs, not to squeeze more productivity out of already strained workforces.
Training will also need to evolve. Future clinicians will have to understand how algorithms work, when to trust them, and when to challenge them. AI literacy, the authors suggest, will soon be as fundamental as anatomy or pharmacology.
Ethics and Accountability in the AI Era
As AI takes on greater roles in diagnosis and decision-making, questions of ethics and liability become unavoidable. Who owns the data that train these systems? Who is responsible when they err? If an AI tool becomes standard practice, could a doctor be sued for not using it?
Current regulations provide few answers. The federal HIPAA law governs identifiable health information, but it doesn’t apply when data are “de-identified”—that is, shared without names, dates, or other personal details. Such de-identified data are commonly used in AI development, placing them outside HIPAA’s scope. Intellectual property laws define ownership of software, not of the data that fuel it. The report notes that as AI becomes more autonomous, the lines of accountability between developer, institution, and clinician will blur further.
The Moment of Choice
The JAMA report ends on a measured but unmistakably urgent note. AI, the authors write, is not a passing trend but a structural change in how medicine is practiced. The question is whether health care will treat it like a scientific innovation – tested, monitored, and improved – or like another piece of enterprise software.
“AI will disrupt every part of health and health care delivery in the coming years,” the report concludes. “The odds that this disruption will improve health for all will depend heavily on the creation of an ecosystem capable of rapid, efficient, and robust knowledge about the consequences of these tools.”
Four Steps Toward Accountability
To steer AI toward safer ground, the JAMA Summit report outlines four interlocking priorities:
Shared responsibility. Developers, clinicians, regulators, and patients must be involved throughout an AI tool’s life cycle, from conception to deployment and beyond. Collaboration can help anticipate unintended effects and rebuild trust that has been eroded by opaque algorithms and vendor secrecy.
Better measurement tools. Health systems need new ways to assess impact, including but not limited to safety. The authors call for frameworks similar to those used in drug surveillance, capable of tracking both benefits and harms after deployment.
A learning data infrastructure. The U.S. lacks a national system to study how AI performs across diverse populations. A shared, federated data network, modeled on the FDA’s Sentinel program for drug safety, could help generate evidence faster and more fairly.
Aligned incentives. Hospitals and developers currently have little motivation to conduct or share rigorous evaluations. Federal incentives, the authors suggest, could play the same catalytic role that government funding once did for electronic health record adoption
Your next recommended read
From Mono to MS? How a Common Virus Can Set Off a Serious Disease
Stanford Medicine researchers reveal new evidence connecting Epstein–Barr virus, a common infection that causes mononucleosis, to the development of multiple sclerosis.