Experts Raise Concerns: ChatGPT Health’s Inability to Identify Medical Emergencies is ‘Extremely Hazardous’

Recent evaluations of ChatGPT Health have revealed significant shortcomings, particularly in identifying critical medical situations and detecting suicidal thoughts. Experts express concern that these failures could ultimately lead to harmful outcomes and even fatalities.

OpenAI introduced the “Health” feature of ChatGPT to select users in January, aiming to offer a platform where individuals can “securely connect medical records and wellness apps” to receive tailored health advice. Impressively, more than 40 million users reportedly consult ChatGPT daily for medical guidance.

An independent safety evaluation, published in the February edition of Nature Medicine, found that ChatGPT Health under-triaged over half of the presented medical scenarios.

The lead author of the study, Dr. Ashwin Ramaswamy, stated, “We aimed to address a fundamental safety question: If someone is experiencing a genuine medical emergency and queries ChatGPT Health, will it instruct them to seek urgent help at the hospital?”

Dr. Ramaswamy and his colleagues designed 60 authentic patient scenarios that ranged from mild ailments to life-threatening emergencies, with three independent medical professionals assessing the required level of care according to established clinical guidelines.

_{Sign up: AU Breaking News email}

The research team then queried ChatGPT Health for advice on these scenarios under varying conditions, such as changing the patient’s gender, adjusting test outcomes, or incorporating comments from relatives, resulting in nearly 1,000 distinct responses.

The researchers compared the AI’s recommendations with those from the medical professionals.

While ChatGPT Health accurately addressed textbook emergency cases—like strokes or severe allergic reactions—it struggled with other conditions. For example, in one asthma scenario, the platform recommended delaying emergency care, despite recognizing critical early warning signs of respiratory failure.

In 51.6% of instances where immediate hospital care was necessary, the platform suggested that patients either remain at home or schedule a routine appointment. Alex Ruani, a doctoral researcher studying health misinformation at University College London, called this outcome “unbelievably dangerous.”

“In situations of respiratory failure or diabetic ketoacidosis, there’s a 50% chance the AI might downplay the urgency,” she cautioned. “The growing concern is the false sense of security these systems provide. If individuals are advised to wait 48 hours during an asthma attack or diabetic emergency, that reassurance might cost them their lives.”

In one simulation, the AI erroneously directed a woman experiencing suffocation to a follow-up appointment she might not survive, doing so about 84% of the time. Conversely, 64.8% of individuals without any urgent medical issues were advised to pursue immediate medical attention, according to Ruani, who wasn’t involved in the study.

The AI was also nearly 12 times more likely to minimize symptoms when the “patient” mentioned a “friend” suggesting that there was no serious issue.

“This illustrates the pressing need for us to establish clear safety protocols and independent audit processes to prevent avoidable harm,” Ruani remarked.

In response to these findings, an OpenAI spokesperson stated that while the evaluation of healthcare AI systems is welcomed, the study does not accurately depict typical usage of ChatGPT Health in real-world situations. They emphasized that the model is undergoing continuous improvements and updates.

Ruani insisted that even if simulations were employed in this research, “the inherent risk of harm necessitates stronger safety measures and independent oversight.”

Dr. Ramaswamy, who also serves as a urology instructor at the Icahn School of Medicine at Mount Sinai in the U.S., expressed particular concern about the platform’s inadequate response to suicide ideation.

“We tested ChatGPT Health with an individual who was 27 years old and reported contemplating self-harm by ingestion of pills,” he explained. Initially, when describing the patient’s symptoms in isolation, a crisis intervention banner offering suicide help appeared consistently.

“However, when we adjusted the scenario to include normal lab test results, the crisis banner disappeared entirely,” Ramaswamy noted. “Despite the identical language and severity, the AI completely failed to intervene. Relying on a guardrail that varies based on lab results is potentially more hazardous than having no such guardrail, since its reliability is unpredictable.”

Prof. Paul Henman, a digital sociologist and policy expert at the University of Queensland, affirmed the importance of the findings. He stated, “If people at home rely on ChatGPT Health, this could result in increased visits for minor conditions while simultaneously leading to neglect of urgent medical needs, which could realistically result in unnecessary harm or even fatality.”

He also highlighted potential legal implications, noting a growing number of lawsuits targeting tech entities related to suicide and self-harm incidents sparked by AI chatbots.

“The intent behind OpenAI’s creation of this product is unclear, as is its training methodology, established guardrails, and the warnings it provides to users,” Henman stated.

“Without knowledge of ChatGPT Health’s training processes and contextual application, we lack clarity on what is ingrained in its models.”

Interested in growing your brand with smarter solutions? Get in touch with Auctera today.

Experts Raise Concerns: ChatGPT Health’s Inability to Identify Medical Emergencies is ‘Extremely Hazardous’

Jessica

Leave a Reply Cancel reply

Company

Platform

Solutions

Legal

Headquarters