Vol. 1 · Edition 026Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

ICLR 2026 flagged

1%

NeurIPS 2026 flagged

28%
Paper
By Sam Taylor with Samwise

On 969 submissions, a 28-point detection gap versus the prior ICLR run, why em-dashes became evidence, and what it means that the field now needs AI to police AI-written research.

NeurIPS caught AI writing the AI papers. Now everyone's arguing about the detector.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

02

Neutral

00

Pro (practical)

01

Pro (hyped)

00

← Anti-AI · Pro-AI →

The NeurIPS 2026 Position Paper Track received 969 submissions and ran every one through Pangram v3.3.2. What came back: 273 papers — 28.2% of the total — had every analyzed text window classified as AI-generated. NeurIPS desk-rejected 178 without appeal. Another 123 were given until June 15, 2026 to produce evidence of substantial human authorship or face rejection.

The same Pangram tool, applied previously to accepted papers from ICLR 2026, flagged 1%.

28.2% vs 1%. The gap between those two numbers is the whole story.

The Position Paper Track's policy required papers to be substantially human-written — AI tools allowed for research assistance and copy-editing, but the final paper must be human prose. A reasonable line to try to hold. The enforcement mechanism — an AI detector catching AI writing — is where things got complicated.

Pangram works by breaking documents into text windows and scoring each window's probability of being AI-generated. The default configuration applied first to NeurIPS submissions flagged 42.7% of papers. NeurIPS switched to refined 100-word windows; the rate fell to 12.7%. The final rejection set used papers where 100% of windows scored as AI-generated — arriving at the 28.2% detection figure and the 18.4% desk-rejection rate.

A 30-point swing in flagging rate from a configuration change that submitting authors had no visibility into, no ability to anticipate, and no recourse against.

The backlash started quickly. Researcher Pasquale Minervini and others found their human-written papers flagged — with evidence pointing toward academic writing conventions pushing Pangram scores up. Em-dashes, dense citation passages, long complex sentences. Pangram's claimed false positive rate is under 0.1% — a rate validated on general text, not academic writing distributions.

Pangram detection: NeurIPS 2026 vs prior ICLR 2026 application
MetricICLR 2026 (accepted papers)NeurIPS 2026 (submissions)
Papers analyzedAccepted set only969 submitted
AI detection rate1%28.2% (273 papers)
Window configuration effectStandard42.7% default → 12.7% refined
Rejection thresholdNone (audit only)100% Pangram score
Papers rejected0178 without appeal
Conditional pathNone123 papers, June 15 deadline

Source spread

Pros & cons

What's real:

  • Something unusual happened in this submission pool. A 1% detection rate on ICLR accepted papers vs 28.2% on NeurIPS position paper submissions is too large a gap to resolve as entirely false positives — even accounting for Pangram's possible miscalibration on academic text, there's a real signal in there. AI-generated academic paper submissions are happening.
  • NeurIPS's policy is reasonable. Requiring substantially human-written papers, with AI allowed for research assistance and copy-editing, is a legitimate line to draw. The alternative — no enforcement — is also not good.
  • Desk rejecting papers with 100% Pangram scores is at least a defined bright line, not an arbitrary one. The methodology was published. Submitting authors knew the policy existed.

What deserves a side-eye:

  • A 30-point swing in detection rate (42.7% → 12.7%) based on window-size configuration choices is not a stable measurement. It's a result that depends on implementation details the tool's users can't see. "We validated this at <0.1% FPR on general text" is not the same as "we validated this at <0.1% FPR on the academic writing distribution we applied it to."
  • Academic writing has distinctive patterns — em-dashes, complex sentence structure, dense citation blocks — that look different from ordinary text and may push Pangram scores upward systematically. The ICLR validation is the only external baseline we have, and that was on accepted papers, not submitted papers. Different distribution.
  • No appeal. That's the hardest part to defend. When the cost of a false positive is destroying a researcher's conference submission without recourse, the acceptable false positive rate approaches zero. Pangram's is not zero.
  • The June 15 deadline for conditional rejections has now passed. We don't have public data on how many of the 123 were ultimately rejected vs reinstated. That absence of transparency doesn't help NeurIPS's case.

Samwise's take

What builders need to know

For builders
  • If you're submitting to major ML and AI conferences in 2026–2027, assume AI-text detection is part of the review process. NeurIPS will not be the last venue to use Pangram or similar tools.
  • Academic writing conventions — em-dashes, long clause-heavy sentences, dense citation sections — appear to increase Pangram scores even in human-written text. If you use AI as a writing aid, revise final prose substantially and aggressively. Don't let AI write the text you then lightly touch-up.
  • Save your drafts. Version history, timed edit logs, timestamped writing sessions — anything documenting when and how the paper was written. If you land in the conditional pool, you'll need evidence of human authorship. Most researchers don't generate this documentation automatically.
  • The no-appeal structure at NeurIPS drew wide criticism. Watch for venue-specific policies on whether appeal mechanisms are included before submitting. This will vary across conferences.
  • Pangram v3.3.2 is the specific tool to be aware of. Even the refined 100-word window configuration flagged papers later found to be human-written. The tool can be run on your own draft before submission — worth doing if your writing style is dense or academic.

Further reading

🌿

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.