Study Warns: ChatGPT Health 'Under-Triaged' Over Half of Medical Emergencies

A new study published in Nature Medicine reveals significant shortcomings in OpenAI's health-focused chatbot, ChatGPT Health, when assessing medical emergencies. Testing 60 real-life medical scenarios, researchers found that the AI failed to recommend immediate emergency room visits in over half (51.6%) of urgent cases, instead suggesting care within 24-48 hours, potentially delaying life-saving treatment.

Study Warns: ChatGPT Health 'Under-Triaged' Over Half of Medical Emergencies

Concerns Raised Over AI Health Assistant's Performance in Emergency Triage

As artificial intelligence becomes increasingly integrated into healthcare, its reliability and safety are under scrutiny. A new study led by researchers at Mount Sinai Hospital in New York and published in the prestigious journal Nature Medicine evaluated OpenAI's dedicated health chatbot, ChatGPT Health, with alarming results.

Research Methodology and Striking Findings

The research team presented ChatGPT Health with 60 medical scenarios based on real cases, each with 16 variations including patient gender and race to test consistency. Three physicians independently triaged these scenarios using professional guidelines as the control standard.

The study found that in cases where physicians unequivocally determined an immediate emergency room (ER) visit was necessary, ChatGPT Health demonstrated a 51.6% rate of "under-triage." This means that for critical situations such as diabetic ketoacidosis (a life-threatening diabetes complication) or impending respiratory failure, the AI did not recommend urgent care but instead suggested "seeing a doctor within 24 to 48 hours."

Lead study author Dr. Ashwin Ramaswamy noted, "Any doctor, and any person who's gone through any degree of training, would say that that patient needs to go to the emergency department." The AI seemed to be "waiting for the emergency to become undeniable" before recommending the ER.

Context and Warnings

ChatGPT Health is OpenAI's health-specific program separate from its general chatbot, claiming a more secure platform for user medical information. OpenAI officially states the tool is "not intended for diagnosis or treatment." However, over 40 million people globally already use ChatGPT for health-related queries.

While prior research showed ChatGPT could pass medical exams and most physicians use some form of AI, this study highlights the current limitations of AI in complex, critical clinical judgment.

Recommendations for Users

  • Understand the Tool's Purpose: AI health assistants can be a source of information but must never replace professional medical diagnosis.
  • Recognize Emergency Symptoms: For acute symptoms like chest pain, difficulty breathing, severe trauma, or loss of consciousness, call emergency services or go to the ER immediately. Do not rely on AI assessment and delay care.
  • Consult a Professional: For any health concerns, especially persistent or worsening symptoms, always consult a doctor or pharmacist.

While technological advances bring convenience to health management, human professional judgment and emergency response systems remain the irreplaceable foundation for life-critical medical decisions.