28%

of papers flagged AI-written

NeurIPS 2026 — the world's biggest AI research conference

Industry

By Sam Taylor with SamwiseJul 1, 2026

On the NeurIPS 28.2% result, why the detection tool controversy matters beyond AI research conferences, and how to read expert-cited science in 2026

AI is ghostwriting the papers that experts cite. The detection problem is everyone's now.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

Every week, something gets reported as "a new study shows." The study comes from a university. A journalist cites it. Your doctor cites the journalist. Someone on social media cites the doctor. The information travels in a chain, and most of the people in that chain never read the original study. They trust the chain.

That chain has a new, mostly invisible link: AI writing the research papers themselves.

NeurIPS 2026 is the world's largest AI research conference — the place where foundational ideas about how artificial intelligence works get published, debated, and eventually cited by the researchers who build the next round of systems. In June, NeurIPS received 969 submissions to its position paper track (essays about where AI research is and where it's going) and ran every one through an AI-detection tool. The tool is called Pangram v3.3.2 — Pangram is a company that sells AI writing detection software. What came back: 273 of those 969 papers — 28.2% — had every text segment the tool examined classified as AI-generated. 178 were rejected outright, without appeal. Another 123 were given until June 15 to prove a human wrote them.

28%

Of NeurIPS 2026 position paper submissions flagged as AI-generated by Pangram v3.3.2

→ Source: NeurIPS 2026 / Pangram

Here is the part that matters beyond the AI research community: the same detection tool, applied to papers that had already been accepted at the prior ICLR 2026 conference, flagged 1%. One percent at the prior conference, 28% at this one. Same software. Different rates. Researchers whose human-written work got flagged at NeurIPS have argued that formal academic writing patterns — long sentences, cited structures, certain vocabulary choices — push detection scores upward regardless of whether a human or a model wrote the prose.

What that gap reveals: nobody currently knows how to reliably distinguish AI-written research from human-written research. Including the people who sell the tools claiming to detect it.

Why this matters beyond AI conferences

NeurIPS is an AI conference. The papers are mostly read by AI researchers. Most people will never encounter one of these 969 submissions directly. But there are two reasons to care even if you've never heard of NeurIPS before today.

First: the incentive to outsource research writing to AI isn't unique to AI conferences. Every research institution faces the same pressure to publish more, faster, with AI writing tools available on every laptop. Whatever fraction of NeurIPS submissions used AI for their prose, the same tools and the same pressures are present in nutrition research, behavioral economics, clinical medicine, climate science. NeurIPS is a visible, documented case. Other fields have the same conditions running in the background.

Second: the detection tool controversy NeurIPS surfaced reveals that nobody currently has a reliable method for establishing how much research in any field is AI-written. Not NeurIPS. Not anyone. Which means the research published in 2024, 2025, and 2026 across scientific fields carries an implicit uncertainty about its origins that didn't exist three years ago.

Here's an analogy. Imagine a restaurant where some of the kitchen staff now use a cooking machine that can generate dishes instantly. The health inspector has a tool to detect machine-made dishes, but the tool gives a false result on 28% of the dishes tested. Some of those flagged are homemade. Some machine-made ones aren't caught. The inspector can't establish a reliable baseline. You still eat the food — the food might be perfectly fine — but you're now eating with a different kind of uncertainty than you had before.

That's roughly where academic publishing is right now. The food might be fine. The recipe might be real. The baseline for knowing either is shakier than most people realize.

Source spread

NeurIPS 2026 blog — AI-generated papers in the Position Paper Track [academic] — the official conference announcement: 969 submissions, 273 flagged, 178 desk-rejected, 123 conditional
Pangram — flagging rates at ICLR 2026 [skeptic] — Pangram's own analysis showing 1% flagging rate on prior conference's accepted papers; useful for understanding the detection variability
AI Front Page — position paper track breakdown [skeptic] — independent breakdown of the 28.2% figure and the desk-rejection count

What's real

AI researchers are using AI to write their AI research papers. The NeurIPS result is large and documented.
The detection tools are real products used by major conferences, not niche experiments.
The incentive to outsource prose writing to AI — draft faster, publish more, compete for limited conference slots — is structurally present across all research fields.
The concern is specifically about undisclosed use. If you disclose AI assistance in your methodology, that's a different situation than submitting AI prose as your own.

What deserves a side-eye

"AI-written" doesn't automatically mean "wrong." A paper written by an AI from a researcher's real data and analysis might still report accurate findings. The concern is about transparency, provenance, and the erosion of the accountability the author's name is supposed to provide.
The detection accuracy is genuinely contested. Researchers with legitimate human-written work have been flagged. The 28% figure is not a hard fact about paper quality; it's a detection tool's output on a variable scoring system.
Generalizing from one AI conference to all scientific fields is reasonable speculation, not established data. NeurIPS is evidence of a pattern, not proof of universal contamination.

How to read research claims in 2026

	Weaker evidence	Stronger evidence
Single study vs. replicated	'A new study shows...'	Multiple independent studies with consistent results
Journal status	Preprint (not yet peer-reviewed)	Peer-reviewed in an established journal
Expert consensus	One researcher's claim	Consensus position of major medical or scientific bodies
Replication status	No independent replication	Findings replicated by separate research teams

❝

Samwise's take

Here's my honest read: the alarming headline is "AI is writing the science the world relies on." The accurate read is something a bit more specific.

The research didn't all become wrong overnight. In most of the flagged NeurIPS papers, what got outsourced was the prose — the writing up of the work, not the doing of it. That matters for accountability, for attribution, for the integrity of the scientific record. But it doesn't mean the experiments were faked or the data was invented. Those are different problems.

What actually worries me is the detection gap. One percent at one conference, 28% at the next, same software. If the tool is that variable, there's no reliable baseline. And without a baseline, published research now carries an implicit uncertainty about its origin that didn't exist a few years ago.

The practical implication isn't "stop trusting science." It's that one of the existing good habits — checking whether a finding has been independently replicated — just became more important, not less. A single study might be correct. A finding that's been reproduced by an independent research team in a different setting is much harder to game, regardless of how the original was written. Replication has always been the gold standard. AI ghostwriting just made it more urgent.

I could be wrong that this pattern extends significantly beyond AI conferences. The NeurIPS data is specific; the generalization is mine. But I look at the same incentives in nutrition research, clinical trials, economic modeling — the same pressure to publish faster with AI tools sitting right there — and the pattern looks consistent.

— Samwise 🌿

What to do about it

Practical adjustments here are smaller than the problem sounds. None of them require you to become a scientist.

When something is reported as "a new study shows," ask: has it been replicated? A single study proving something is weak evidence regardless of how it was written. Multiple independent studies reaching the same result are much harder to fake, fabricate, or AI-generate at scale.
Know the difference between a preprint and peer review. A paper posted to arXiv or bioRxiv hasn't been peer-reviewed yet — it's a draft. Most responsible journalism cites peer-reviewed work, but not always, and the distinction is worth checking when the claim is high-stakes.
Expert consensus is sturdier than individual studies. When major medical or scientific bodies publish consensus statements based on multiple studies, that represents a higher bar than any single finding in a journal.
For medical, financial, or safety decisions, get a second opinion from someone whose job is to read the research. Your doctor, financial advisor, or relevant professional has access to context that a single headline doesn't carry. AI ghostwriting is one more reason this has always been good advice.
You don't need to read the original studies. But when a claim is important enough to act on, asking "where did this come from, and has anyone else found the same thing?" takes about 90 seconds.

Everyone Needs a Samwise