Zealous and thorough in diagnosis and care, like a recent medical graduate or resident. Doctor ChatGPT is once again put to the test in a scientific study and overall earning a ‘promotion’ from the authors of the work, researchers at Mass General Brigham in the USA, who speak of “impressive accuracy” in the clinical decision-making process. He is young but has what it takes to grow, in other words their conclusion.

To be precise, according to the study, the accuracy of this chatbot to which a frontier of artificial intelligence, that of the Llm (Large-Language Model), was applied, was equal to 72% as regards the clinical decision-making process in its complex, from the identification of possible diagnoses to be evaluated, to the definition of the final diagnosis, up to decisions on the management of assistance. And ChatGpt performed “equally well” in both primary and emergency care and across all medical specialties. The research team’s findings were published in the ‘Journal of Medical Internet Research’.

Our paper, explains the study’s corresponding author, Marc Succi, “comprehensively evaluates decision support via ChatGpt from the beginning of work with a patient through the entire care scenario, from differential diagnosis through testing, to diagnosis and management. There are no real benchmarks, but we estimate that this performance is at the level of someone who has just graduated from medical school, as an intern, or a resident. This tells us that LLMs in general have the potential to be an empowering tool for the practice of medicine and to support clinical decision-making with precision.The study shows how these models could be used in clinical counseling and decision-making.Succi and his team tested a hypothesis: that ChatGpt may be able to process an entire clinical encounter with a patient and recommend a diagnostic workup, decide the course of clinical management and ultimately formulate the final diagnosis.

As part of the research, ChatGpt was first asked to come up with a set of possible or differential diagnoses based on the patient’s initial information, which included age, gender, symptoms, and whether the case was an emergency. He was then provided with additional information and asked to make management decisions and provide a final diagnosis, simulating the entire process of seeing a real patient. The team compared the accuracy of ChatGpt across differential diagnosis, diagnostic testing, final diagnosis and management in a blinded structured trial, awarding points for correct answers and using tools to evaluate the relationship between ChatGpt performance and demographic information present in 36 cartoons.

The task in which ChatGpt performed best was the final diagnosis for which it achieved an accuracy of 77%. While the lowest performances were in differential diagnoses (60%). And it was only 68% accurate in clinical management decisions, such as figuring out which drugs to treat the patient with after arriving at the correct diagnosis. Any other details from the study? ChatGpt showed no gender bias. In short, it was also politically correct.

“Instead, he fought a bit with differential diagnosis. This tells us where” flesh and blood “doctors are truly expert and bring their maximum added value”, reflects Succi. However, the authors point out that further background research and regulatory guidance is needed before tools such as ChatGPT can be considered for integration into clinical care. “Mass General Brigham sees great promise for LLMs and their contributions to improving health care delivery and the physician experience,” concludes co-author Adam Landman. “We are currently evaluating solutions that help with clinical documentation and drafting responses to patient messages.”