Vol. 1 · Edition 027Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

28%
of papers flagged AI-written
NeurIPS 2026 — the world's biggest AI research conference
Industry
By Sam Taylor with Samwise

On the NeurIPS 28.2% result, why the detection tool controversy matters beyond AI research conferences, and how to read expert-cited science in 2026

AI is ghostwriting the papers that experts cite. The detection problem is everyone's now.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

02

Neutral

01

Pro (practical)

00

Pro (hyped)

00

← Anti-AI · Pro-AI →

Every week, something gets reported as "a new study shows." The study comes from a university. A journalist cites it. Your doctor cites the journalist. Someone on social media cites the doctor. The information travels in a chain, and most of the people in that chain never read the original study. They trust the chain.

That chain has a new, mostly invisible link: AI writing the research papers themselves.

NeurIPS 2026 is the world's largest AI research conference — the place where foundational ideas about how artificial intelligence works get published, debated, and eventually cited by the researchers who build the next round of systems. In June, NeurIPS received 969 submissions to its position paper track (essays about where AI research is and where it's going) and ran every one through an AI-detection tool. The tool is called Pangram v3.3.2 — Pangram is a company that sells AI writing detection software. What came back: 273 of those 969 papers — 28.2% — had every text segment the tool examined classified as AI-generated. 178 were rejected outright, without appeal. Another 123 were given until June 15 to prove a human wrote them.

28%
Of NeurIPS 2026 position paper submissions flagged as AI-generated by Pangram v3.3.2

→ Source: NeurIPS 2026 / Pangram

Here is the part that matters beyond the AI research community: the same detection tool, applied to papers that had already been accepted at the prior ICLR 2026 conference, flagged 1%. One percent at the prior conference, 28% at this one. Same software. Different rates. Researchers whose human-written work got flagged at NeurIPS have argued that formal academic writing patterns — long sentences, cited structures, certain vocabulary choices — push detection scores upward regardless of whether a human or a model wrote the prose.

What that gap reveals: nobody currently knows how to reliably distinguish AI-written research from human-written research. Including the people who sell the tools claiming to detect it.

Why this matters beyond AI conferences

NeurIPS is an AI conference. The papers are mostly read by AI researchers. Most people will never encounter one of these 969 submissions directly. But there are two reasons to care even if you've never heard of NeurIPS before today.

First: the incentive to outsource research writing to AI isn't unique to AI conferences. Every research institution faces the same pressure to publish more, faster, with AI writing tools available on every laptop. Whatever fraction of NeurIPS submissions used AI for their prose, the same tools and the same pressures are present in nutrition research, behavioral economics, clinical medicine, climate science. NeurIPS is a visible, documented case. Other fields have the same conditions running in the background.

Second: the detection tool controversy NeurIPS surfaced reveals that nobody currently has a reliable method for establishing how much research in any field is AI-written. Not NeurIPS. Not anyone. Which means the research published in 2024, 2025, and 2026 across scientific fields carries an implicit uncertainty about its origins that didn't exist three years ago.

Here's an analogy. Imagine a restaurant where some of the kitchen staff now use a cooking machine that can generate dishes instantly. The health inspector has a tool to detect machine-made dishes, but the tool gives a false result on 28% of the dishes tested. Some of those flagged are homemade. Some machine-made ones aren't caught. The inspector can't establish a reliable baseline. You still eat the food — the food might be perfectly fine — but you're now eating with a different kind of uncertainty than you had before.

That's roughly where academic publishing is right now. The food might be fine. The recipe might be real. The baseline for knowing either is shakier than most people realize.

Source spread

What's real

  • AI researchers are using AI to write their AI research papers. The NeurIPS result is large and documented.
  • The detection tools are real products used by major conferences, not niche experiments.
  • The incentive to outsource prose writing to AI — draft faster, publish more, compete for limited conference slots — is structurally present across all research fields.
  • The concern is specifically about undisclosed use. If you disclose AI assistance in your methodology, that's a different situation than submitting AI prose as your own.

What deserves a side-eye

  • "AI-written" doesn't automatically mean "wrong." A paper written by an AI from a researcher's real data and analysis might still report accurate findings. The concern is about transparency, provenance, and the erosion of the accountability the author's name is supposed to provide.
  • The detection accuracy is genuinely contested. Researchers with legitimate human-written work have been flagged. The 28% figure is not a hard fact about paper quality; it's a detection tool's output on a variable scoring system.
  • Generalizing from one AI conference to all scientific fields is reasonable speculation, not established data. NeurIPS is evidence of a pattern, not proof of universal contamination.
How to read research claims in 2026
Weaker evidenceStronger evidence
Single study vs. replicated'A new study shows...'Multiple independent studies with consistent results
Journal statusPreprint (not yet peer-reviewed)Peer-reviewed in an established journal
Expert consensusOne researcher's claimConsensus position of major medical or scientific bodies
Replication statusNo independent replicationFindings replicated by separate research teams

What to do about it

Practical adjustments here are smaller than the problem sounds. None of them require you to become a scientist.

  • When something is reported as "a new study shows," ask: has it been replicated? A single study proving something is weak evidence regardless of how it was written. Multiple independent studies reaching the same result are much harder to fake, fabricate, or AI-generate at scale.
  • Know the difference between a preprint and peer review. A paper posted to arXiv or bioRxiv hasn't been peer-reviewed yet — it's a draft. Most responsible journalism cites peer-reviewed work, but not always, and the distinction is worth checking when the claim is high-stakes.
  • Expert consensus is sturdier than individual studies. When major medical or scientific bodies publish consensus statements based on multiple studies, that represents a higher bar than any single finding in a journal.
  • For medical, financial, or safety decisions, get a second opinion from someone whose job is to read the research. Your doctor, financial advisor, or relevant professional has access to context that a single headline doesn't carry. AI ghostwriting is one more reason this has always been good advice.
  • You don't need to read the original studies. But when a claim is important enough to act on, asking "where did this come from, and has anyone else found the same thing?" takes about 90 seconds.

Further reading

🌿

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.