Vol. 1 · Edition 025Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

230M
health questions asked of ChatGPT
per week — OpenAI, June 2026
Product
By Sam Taylor with Samwise

On 230 million weekly health queries, a 71% factuality improvement, and the gap between 'clinically evaluated' and 'safe to rely on alone.'

ChatGPT just got a medical upgrade. Here's what free users should actually trust it to do.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

02

Neutral

00

Pro (practical)

00

Pro (hyped)

01

← Anti-AI · Pro-AI →

If you've typed a symptom into ChatGPT in the last year — "is this headache something to worry about," "what are the side effects of this medication," "I have a rash that looks like" — you're not alone. A lot of people already use ChatGPT as a first stop for health questions. OpenAI says that number is 230 million people per week.

On June 18, OpenAI made the model answering those questions substantially better at answering them.

What actually changed

GPT-5.5 Instant — the model free ChatGPT users get by default, without paying anything — received what OpenAI calls a "health intelligence" upgrade. The company's summary: the model now performs comparably to its most capable frontier models on health evaluations, including something called HealthBench Professional, which uses real clinician-style conversations and physician-authored grading rubrics to assess accuracy, safety, and appropriate escalation.

The concrete number: OpenAI reports a 71% decline in health responses flagged for factuality issues over two months of live traffic monitoring, comparing GPT-5.5 Instant to its predecessor GPT-5.3 Instant.

That is a large number. 71% fewer wrong or misleading health answers is worth stopping on.

71%
Decline in health responses flagged for factuality issues vs GPT-5.3 Instant

→ Source: OpenAI, June 2026

The upgrade also improves: recognizing when a situation warrants urgent care, asking follow-up questions to get relevant context before answering, explaining uncertainty when the model isn't sure, and communicating complex information in plainer language.

An object lesson in what this means

Here's a way to think about it that doesn't require any medical or technical background.

Imagine a friend who has read every major medical journal published in the last decade. They know an enormous amount about how diseases work, what medications interact badly, what symptoms tend to go together. They're available at 2 AM when the weird rash appears and the pharmacy is closed. They're patient with follow-up questions. They've just gotten meaningfully better at knowing what they don't know and saying so.

That friend is genuinely useful. But they haven't examined you. They can't order the blood test.

ChatGPT is that friend. The upgrade makes the friend better. It doesn't make them your doctor.

ChatGPT health: where it helps, and where to go further
Question typeChatGPT is useful hereStill verify with a professional
Understanding a new diagnosisExplaining what a condition means in plain EnglishGetting a second opinion on the diagnosis itself
Medication side effectsSummarizing common side effects and interactionsWhether to adjust your specific dose
Preparing for an appointmentGenerating questions to ask your doctorReplacing the appointment entirely
Symptom triageGauging urgency (ER now vs. wait-and-see)Diagnosing what is actually causing the symptom
Lab result interpretationTranslating what a number means in generalWhether your specific number is a problem

Source spread

What's real

  • A 71% reduction in factuality issues is a genuine improvement, not a rounding-error win. If it holds in independent testing, this is a meaningful reduction in medical misinformation reaching free users.
  • Getting health intelligence to the free tier matters. A lot of people who use ChatGPT for health questions can't afford expensive subscriptions. Making the default model better at this serves the people with the least access to other resources.
  • HealthBench Professional is a real evaluation framework backed by physician-authored rubrics. Better than most AI health benchmarks, which are usually not graded by anyone who has seen a patient.

What deserves a side-eye

  • "Clinical-quality" and "not intended for diagnosis or treatment" appear in the same announcement. OpenAI wants the credibility of the first framing and the legal protection of the second. At some point those two things start to pull against each other.
  • 71% is OpenAI's own measurement of OpenAI's own model on OpenAI's own live-traffic monitoring. Independent reproduction of that number doesn't exist yet.
  • The HealthBench Professional leaderboard is an OpenAI-designed benchmark. That's not disqualifying — it's a thoughtfully constructed benchmark — but it's worth noting who built the test and who scored highest on it.

ChatGPT is not intended to replace professional medical advice, diagnosis, or treatment.

OpenAI, June 18 2026

What to do about it

  • Use it for the "should I worry about this" question. ChatGPT is genuinely better now at helping you gauge whether something needs urgent attention. That is a real use case, even if it shouldn't be the only data point.
  • Ask it to explain, not diagnose. The most reliable version of this tool helps you understand information you already received, not generate a fresh opinion. "What does elevated creatinine mean?" is a better use than "what do I have?"
  • Read the uncertainty signals. When ChatGPT says something like "consult a physician before" — that is the model flagging that it isn't confident. Not boilerplate. It means something. Read it.
  • Cross-check anything that drives a real decision. For anything that would change whether you take a medication, skip an appointment, or choose a treatment path: use ChatGPT as input, not as answer. The CDC, Medline Plus, and your actual care provider still exist.
  • The stakes change for serious conditions. This upgrade is about accuracy on general health knowledge. It doesn't change the calculus for cancer, autoimmune disease, or anything where individual variation matters enormously. For those: your care team, not a chatbot.

Further reading

🌿

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.