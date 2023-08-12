The patient was a 39-year-old woman. She had suffered pain in her left knee for several days. The day before she had had a fever. She had already given in, but she still had chills. And her knee was red and swollen.

What was the diagnosis?

Megan Landon, a medical resident at Beth Israel Deaconess Medical Center in Boston, recently presented this real case to medical students and residents gathered to learn to think like a doctor.

But this time, they could turn to GPT-4, the most recent version of a chatbot released by the OpenAI company.

Doctors at Beth Israel Deaconess, a teaching hospital affiliated with Harvard Medical School, They decided to explore how chatbots could be used—and misused—in training future doctors.

Instructors like Adam Rodman hope that medical students can turn to chatbots for something similar to what doctors call a sidewalk consultation, when they ask a colleague for an opinion on a difficult case.

Experienced doctors use what is called a disease script in medicine: signs, symptoms, and test results that tell a coherent story based on similar cases. If the disease script doesn’t help, Rodman said, doctors turn to other strategies, such as assigning probabilities to various diagnoses.

Researchers have years of trying to design computer programs to make diagnoses, but nothing has been successful.

Doctors say that GPT-4 is different.

In a study published in JAMA, doctors at Beth Israel Deaconess found that GPT-4 outperformed most doctors on weekly diagnostic challenges published in The New England Journal of Medicine.

But they learned that there is an art to using the program, and it has its pitfalls. Christopher Smith, director of residents at the medical center, said that learning involves trying to figure things out: “Part of learning is struggling. If you outsource the learning to GPT, it’s no longer a battle.”

At the meeting, students and residents teamed up to find out what was wrong with the patient with the swollen knee. Then they turned to GPT-4.

One group used GPT-4 to do an Internet search, similar to using Google. The chatbot offered possible diagnoses, including trauma. But when the group asked it to explain its reasoning, the bot disappointed, saying only, “Trauma is a common cause of knee injury.”

Another group thought of possible hypotheses and asked GPT-4 to review them. The chatbot’s list matched the group’s: infections, including Lyme disease; arthritis, including gout; and trauma.

GPT-4 added rheumatoid arthritis to the possibilities. Gout, the instructors later told the group, was unlikely because the patient was young and female. And rheumatoid arthritis could probably be ruled out because only one joint was inflamed, and only for a short time.

To use the bot correctly, the instructors said, one would have to start by telling GPT-4 something like: “You are a doctor treating a 39-year-old woman with knee pain.” They would then list her symptoms before requesting a diagnosis and follow up with questions about the bot’s reasoning, just as they would with a colleague.

That, the instructors said, is a way to better utilize GPT-4. But it’s also crucial to know that chatbots can make mistakes. Using them requires knowing that they may be wrong.

At the end of the session, the instructors revealed the real reason for the swollen knee: the woman had Lyme disease.

Olivia Allison contributed reporting to this article.

GINA KOLATA

The New York Times