Take two aspirin and call me in the morning: If only prescribing medications were as simple as that.
ADVERTISEMENT |
In reality, the prescription process involves many players and steps. Details must be accurately spelled out, interpreted, and double-checked to ensure patients get the correct drug and dosage.
“Errors can happen at any point, from the initiation of the prescription by the provider to the processing of the prescription by the pharmacy, to the time of administering the medication to the patient,” says Mohsen Bayati, a professor of operations, information, and technology at Stanford Graduate School of Business. “It can be a mistake related to the medication prescribed or in the directions given to the patient.”
“There may be a prescribing error on the physician’s part, and the pharmacy doesn’t catch it,” says Daniel Tawfik, an assistant professor of pediatrics at Stanford Medicine. “Or a prescription may come in correctly, but the way the pharmacist fills it turns it into a different dispensed medication. It could even be a simple typo due to manual entry, like specifying a dosage in milligrams vs. micrograms.”
Whatever their cause, prescription errors have significant consequences: at least 1.5 million preventable adverse events and an estimated $3.5 billion in costs per year in the U.S. There are mechanisms for catching such errors, including standardized guidelines for writing instructions for patients, rule-based systems, and pharmacist verification, especially for more error-prone medications such as the diabetes drug metformin. “But manual audit systems are labor-intensive and costly,” Bayati says.
That mix of complexity and high stakes intrigued Bayati and Tawfik, who have previously collaborated on ways to make healthcare more efficient and safer. With several collaborators at Amazon, Bayati developed an AI-based system for translating prescriptions with fewer errors. “An AI approach to this problem is a good choice because it’s capable of seeing patterns in prescription data and could flag prescriptions that match error-related patterns and write error-free ones,” Bayati says.
The researchers aimed their prescription-translation algorithm at the period when a pharmacy is preparing a prescription, focusing on the instructions provided to patients about the medication. “It’s the most common kind of error,” Bayati says. “Assuming the medication is correctly selected, it needs to be accompanied by a clear set of directions.” Their system, a large language model called MEDIC, translated doctors’ prescription instructions significantly more accurately than existing systems and other, broader LLMs.
Written prescriptions are good targets for the natural language processing that LLMs perform at scale. But just letting an LLM loose to translate prescription-instruction data isn’t optimal, Bayati points out, because “the models aren’t penalized much during their training for small errors like changing a dosage schedule from weekly to daily.” As Tawfik says, “A small change in language can be a large problem with a prescription.”
Call the MEDIC
When the researchers initially tested standard LLMs on a set of prescription data from Amazon Pharmacy, they found higher error rates in translating prescriptions compared to the standard rule-based methods used by pharmacies. “We realized we had to combine approaches into something that takes the benefits of LLMs but has guardrails based on pharmacy guidelines,” Bayati says. “A combination of pharmacy domain knowledge and LLM capability.”
The researchers created their LLM by training it on a sample of Amazon Pharmacy data, including 1,000 expert-annotated, augmented prescription directions, or those where humans with pharmaceutical knowledge provided written labels and inputs on top of the raw prescription data provided by clinicians.
They tested the system, which they called a “medication direction copilot,” or MEDIC, in multiple ways. For example, they compared MEDIC’s translations of inbound prescription instructions from physicians against human pharmacists’ translations to see how closely they matched. While MEDIC performed well on this measure, a different, broader LLM trained on 1.5 million prescriptions performed better according to standard measures of translation accuracy.
‘We can use AI to take out the repetitive, less intellectual tasks so clinicians can focus on caring for the patient, which is presumably more rewarding for them.’
—Mohsen Bayati
“But that’s not the right metric,” Bayati says. “The right one from a clinical point of view is which outputs are safer for patients.” They reached that measure by having pharmacists assess the presence of clinical errors in prescription translations by MEDIC and other LLMs. MEDIC outperformed the different systems by a wide margin: The other LLMs’ instructions had between 50% and 400% more errors.
To further gauge MEDIC’s effectiveness, the researchers tested it through Amazon Pharmacy’s prescription-production system. Specifically, they assessed whether MEDIC could avoid typographic errors, incomplete entries, and other mistakes that pharmacist audits are supposed to catch before patients receive the instructions. “These near-misses can be used as a proxy for errors,” Bayati notes.
MEDIC reduced near-misses by approximately 33%, outperforming the existing system. “In patient safety, we talk about a Swiss cheese model,” Tawfik says. “Each safety system has some holes—none is perfect. So if the holes all line up, an error can get through and reach the patient. If you can reduce those holes, you can really improve safety. This is a great example of that.”
Not a replacement for pharmacists
An AI-based system like MEDIC could be highly valuable in part because it can avoid the human errors that result from the stress of updating electronic medical records and pharmacy medical systems. “Errors happen when people are fatigued or working past their peak productive time,” Bayati says. His past research with Tawfik illuminates the link between burdensome medical record systems and provider burnout.
“This assembly-line type of work of translating from one system to another can predispose to cutting corners and burnout,” Tawfik says. In unpublished research, Tawfik found that for every five hours a healthcare provider works, their risk of making an error rises by about 3%. The risk grows when providers work for four or more consecutive days.
Implementing a system like MEDIC could also free pharmacists to do higher-level work rather than routine translations or audits. “They could focus more on the mechanics and pharmacokinetics of medication rather than just transcribing from one system to another,” Tawfik says.
Bayati agrees: “We can use AI to take out the repetitive, less intellectual tasks so clinicians can focus on caring for the patient, which is presumably more rewarding for them.”
Yet the researchers acknowledge that creating and instituting a system like MEDIC would come with unique challenges.
One barrier is the scale and nature of human effort involved. “For any AI technology, you need very close collaboration between clinicians and the technical team, like the data scientists and machine-learning engineers, along with the product team,” Bayati says. “You need to get all those experts sitting at the table together exchanging ideas iteratively, and it can be difficult to incentivize that close collaboration or create the right cultural training for it.”
Tawfik points to a different kind of challenge: “In a complex, high-stakes domain like medicine, we really need to have a human in the loop. There are still important things that a pharmacist may catch that an AI system can’t. And we don’t have a regulatory framework in place for what happens if AI makes an error. We have to find the right balance between machine and human.”
Published Aug. 12, 2025, in Stanford Graduate School of Business Insights.
Add new comment