Featured Product
This Week in Quality Digest Live
Health Care Features
Claudine Mangen
If you have the energy to try and address organizational overwork, start small
Gregory Way
Drug designers are rethinking their response to medications that affect multiple targets
Adam Zewe
Research on machine-learning models that can help doctors consult patient records more easily
Karina Montoya
Analysis of social and economic impact bolsters the fight against hepatitis C
Tom Rish
Keep it organized and ready for inspection at any time

More Features

Health Care News
Making the new material freely available to testing laboratories and manufacturers worldwide
Google Docs collaboration, more efficient management of quality deviations
MIT course focuses on the impact of increased longevity on systems and markets
Delivers time, cost, and efficiency savings while streamlining compliance activity
First responders may benefit from NIST contest to reward high-quality incident command dashboards
Enhances clinical data management for medtech companies
Winter 2022 release of Reliance QMS focuses on usability, mobility, and actionable insights
The tabletop diagnostic yields results in an hour and can be programmed to detect variants of the SARS-CoV-2 virus
First Responder UAS Triple Challenge focuses on using optical sensors and data analysis to improve image detection and location

More News

Adam Zewe

Health Care

Teaching AI to Ask Clinical Questions

Research on machine-learning models that can help doctors consult patient records more easily

Published: Thursday, August 4, 2022 - 12:02

Physicians often query a patient’s electronic health record for information that helps them make treatment decisions, but the cumbersome nature of these records hampers the process. Research has shown that even when a doctor has been trained to use an electronic health record (EHR), finding an answer to just one question can take, on average, more than eight minutes.

The more time physicians must spend navigating an oftentimes clunky EHR interface, the less time they have to interact with patients and provide treatment.

Researchers have begun developing machine-learning (ML) models that can streamline the process by automatically finding information physicians need in an EHR. However, training effective models requires huge datasets of relevant medical questions, which are often hard to come by due to privacy restrictions. Existing models struggle to generate authentic questions—those that would be asked by a human doctor—and are often unable to successfully find correct answers.

To overcome this data shortage, researchers at MIT partnered with medical experts to study the questions physicians ask when reviewing EHRs. Then, they built a publicly available dataset of more than 2,000 clinically relevant questions written by these medical experts.

When they used their dataset to train an ML model to generate clinical questions, they found that the model asked high-quality and authentic questions, compared to real questions from medical experts, more than 60 percent of the time.

With this dataset, they plan to generate vast numbers of authentic medical questions and then use those questions to train an ML model that would help doctors find sought-after information in a patient’s record more efficiently.

“Two thousand questions may sound like a lot, but when you look at machine-learning models being trained nowadays, they have so much data, maybe billions of data points,” says lead dataset author Eric Lehman, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL). “When you train machine-learning models to work in healthcare settings, you have to be really creative because there is such a lack of data.”

The senior author is Peter Szolovits, a professor in MIT’s Dept. of Electrical Engineering and Computer Science (EECS) who heads the Clinical Decision-Making Group in CSAIL and is also a member of the MIT-IBM Watson AI Lab. The research paper, a collaboration between co-authors at MIT, the MIT-IBM Watson AI Lab, IBM Research, and the doctors and medical experts who helped create questions and participated in the study, was presented at the annual conference of the North American Chapter of the Association for Computational Linguistics.

“Realistic data are critical for training models that are relevant to the task yet difficult to find or create,” Szolovits says. “The value of this work is in carefully collecting questions asked by clinicians about patient cases, from which we are able to develop methods that use these data and general language models to ask further plausible questions.”

Data deficiency

The few large datasets of clinical questions the researchers were able to find had a host of issues, Lehman explains. Some were composed of medical questions asked by patients on web forums, which are a far cry from physician questions. Other datasets contained questions produced from templates, so they are mostly identical in structure, making many questions unrealistic.

“Collecting high-quality data is really important for doing machine-learning tasks, especially in a healthcare context, and we’ve shown that it can be done,” Lehman says.

To build their dataset, the MIT researchers worked with practicing physicians and medical students in their last year of training. They gave these medical experts more than 100 EHR discharge summaries and told them to read through a summary and ask any questions they might have. The researchers didn’t put any restrictions on question types or structures in an effort to gather natural questions. They also asked the medical experts to identify the “trigger text” in the EHR that led them to ask each question.

For instance, a medical expert might read a note in the EHR that says a patient’s past medical history is significant for prostate cancer and hypothyroidism. The trigger text “prostate cancer” could lead the expert to ask questions like “date of diagnosis?” or “any interventions done?”

They found that most questions focused on symptoms, treatments, or the patient’s test results. While these findings weren’t unexpected, quantifying the number of questions about each broad topic will help them build an effective dataset for use in a real clinical setting, says Lehman.

Once they had compiled their dataset of questions and accompanying trigger text, they used it to train ML models to ask new questions based on the trigger text.

Then the medical experts determined whether those questions were “good” using four metrics: understandability (Does the question make sense to a human physician?), triviality (Is the question too easily answerable from the trigger text?), medical relevance (Does it makes sense to ask this question based on the context?), and relevancy to the trigger (Is the trigger related to the question?).

Cause for concern

The researchers found that when a model was given trigger text, it was able to generate a good question 63 percent of the time, whereas a human physician would ask a good question 80 percent of the time.

They also trained models to recover answers to clinical questions using the publicly available datasets they had found at the outset of this project. Then they tested these trained models to see if they could find answers to “good” questions asked by human medical experts.

The models were only able to recover about 25 percent of answers to physician-generated questions.

“That result is really concerning,” Lehman says. “What people thought were good-performing models were, in practice, just awful because the evaluation questions they were testing on weren’t good to begin with.”

The team is now applying this work toward its initial goal: Building a model that can automatically answer physicians’ questions in an EHR. For the next step, they will use their dataset to train a machine-learning model that can automatically generate thousands or millions of good clinical questions, which can then be used to train a new model for automatic question answering.

Although there is still much work to do before that model could be a reality, Lehman is encouraged by the strong initial results the team demonstrated with this dataset.

First published July 14, 2022, on MIT News.

Discuss

About The Author

Adam Zewe’s picture

Adam Zewe

Adam Zewe is a writer for Massachusetts Institute of Technology, covering the electrical engineering and computer science beat in the MIT News Office.