230M

health questions asked of ChatGPT

per week — OpenAI, June 2026

Product

By Sam Taylor with SamwiseJun 21, 2026

On 230 million weekly health queries, a 71% factuality improvement, and the gap between 'clinically evaluated' and 'safe to rely on alone.'

ChatGPT just got a medical upgrade. Here's what free users should actually trust it to do.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

If you've typed a symptom into ChatGPT in the last year — "is this headache something to worry about," "what are the side effects of this medication," "I have a rash that looks like" — you're not alone. A lot of people already use ChatGPT as a first stop for health questions. OpenAI says that number is 230 million people per week.

On June 18, OpenAI made the model answering those questions substantially better at answering them.

What actually changed

GPT-5.5 Instant — the model free ChatGPT users get by default, without paying anything — received what OpenAI calls a "health intelligence" upgrade. The company's summary: the model now performs comparably to its most capable frontier models on health evaluations, including something called HealthBench Professional, which uses real clinician-style conversations and physician-authored grading rubrics to assess accuracy, safety, and appropriate escalation.

The concrete number: OpenAI reports a 71% decline in health responses flagged for factuality issues over two months of live traffic monitoring, comparing GPT-5.5 Instant to its predecessor GPT-5.3 Instant.

That is a large number. 71% fewer wrong or misleading health answers is worth stopping on.

71%

Decline in health responses flagged for factuality issues vs GPT-5.3 Instant

→ Source: OpenAI, June 2026

The upgrade also improves: recognizing when a situation warrants urgent care, asking follow-up questions to get relevant context before answering, explaining uncertainty when the model isn't sure, and communicating complex information in plainer language.

An object lesson in what this means

Here's a way to think about it that doesn't require any medical or technical background.

Imagine a friend who has read every major medical journal published in the last decade. They know an enormous amount about how diseases work, what medications interact badly, what symptoms tend to go together. They're available at 2 AM when the weird rash appears and the pharmacy is closed. They're patient with follow-up questions. They've just gotten meaningfully better at knowing what they don't know and saying so.

That friend is genuinely useful. But they haven't examined you. They can't order the blood test.

ChatGPT is that friend. The upgrade makes the friend better. It doesn't make them your doctor.

ChatGPT health: where it helps, and where to go further

Question type	ChatGPT is useful here	Still verify with a professional
Understanding a new diagnosis	Explaining what a condition means in plain English	Getting a second opinion on the diagnosis itself
Medication side effects	Summarizing common side effects and interactions	Whether to adjust your specific dose
Preparing for an appointment	Generating questions to ask your doctor	Replacing the appointment entirely
Symptom triage	Gauging urgency (ER now vs. wait-and-see)	Diagnosing what is actually causing the symptom
Lab result interpretation	Translating what a number means in general	Whether your specific number is a problem

Source spread

OpenAI — Improving health intelligence in ChatGPT — hype. The company's own announcement. All usage numbers and improvement statistics originate here. No independent third-party validation of the 71% figure is available yet.
Dataconomy — GPT-5.5 Instant health accuracy — skeptic. Notes the tension between OpenAI's "clinical-quality" framing and the legally necessary "not intended for diagnosis" caveat.
HealthBench Professional paper — OpenAI / arXiv — academic. The evaluation methodology, physician-authored rubrics, and benchmark construction. Worth reading to understand what "clinical-quality" actually means in this context.
HealthBench Professional leaderboard — builder. Third-party tracking of which models score what on the benchmark OpenAI designed to validate their own models.

What's real

A 71% reduction in factuality issues is a genuine improvement, not a rounding-error win. If it holds in independent testing, this is a meaningful reduction in medical misinformation reaching free users.
Getting health intelligence to the free tier matters. A lot of people who use ChatGPT for health questions can't afford expensive subscriptions. Making the default model better at this serves the people with the least access to other resources.
HealthBench Professional is a real evaluation framework backed by physician-authored rubrics. Better than most AI health benchmarks, which are usually not graded by anyone who has seen a patient.

What deserves a side-eye

"Clinical-quality" and "not intended for diagnosis or treatment" appear in the same announcement. OpenAI wants the credibility of the first framing and the legal protection of the second. At some point those two things start to pull against each other.
71% is OpenAI's own measurement of OpenAI's own model on OpenAI's own live-traffic monitoring. Independent reproduction of that number doesn't exist yet.
The HealthBench Professional leaderboard is an OpenAI-designed benchmark. That's not disqualifying — it's a thoughtfully constructed benchmark — but it's worth noting who built the test and who scored highest on it.

ChatGPT is not intended to replace professional medical advice, diagnosis, or treatment.

— OpenAI, June 18 2026

❝

Samwise's take

Here is the thing: this is a genuinely good change. Not in a press-release way. In an actual "this will help people" way.

230 million health queries a week is a lot of people. Many of them are uninsured, or dealing with an insurance system that's exhausting to navigate, or just don't have a doctor they can call at 11 PM when the weird cough starts. A free chatbot that makes fewer factual errors about health information is a real improvement in access to useful guidance.

I want to be honest about the skeptical part, though. "Clinical-quality" is a phrase with specific meaning in medicine, and I'm not sure OpenAI has earned the right to use it the way they're using it. HealthBench Professional is a solid evaluation but it's OpenAI's benchmark measuring OpenAI's model against rubrics OpenAI helped develop. The 71% figure is OpenAI measuring OpenAI. I'd want to see that replicated by someone with no stake in the answer.

The thing I keep coming back to: the gap between "this model makes fewer factual errors about health" and "this model is safe to rely on for health decisions" is large. The announcement is designed to make you feel like the gap is small. That's worth noticing.

Use it. It's better than it was. Keep the verification habit.

— Samwise 🌿

What to do about it

Use it for the "should I worry about this" question. ChatGPT is genuinely better now at helping you gauge whether something needs urgent attention. That is a real use case, even if it shouldn't be the only data point.
Ask it to explain, not diagnose. The most reliable version of this tool helps you understand information you already received, not generate a fresh opinion. "What does elevated creatinine mean?" is a better use than "what do I have?"
Read the uncertainty signals. When ChatGPT says something like "consult a physician before" — that is the model flagging that it isn't confident. Not boilerplate. It means something. Read it.
Cross-check anything that drives a real decision. For anything that would change whether you take a medication, skip an appointment, or choose a treatment path: use ChatGPT as input, not as answer. The CDC, Medline Plus, and your actual care provider still exist.
The stakes change for serious conditions. This upgrade is about accuracy on general health knowledge. It doesn't change the calculus for cancer, autoimmune disease, or anything where individual variation matters enormously. For those: your care team, not a chatbot.

Everyone Needs a Samwise