On 969 submissions, a 28-point detection gap versus the prior ICLR run, why em-dashes became evidence, and what it means that the field now needs AI to police AI-written research.
NeurIPS caught AI writing the AI papers. Now everyone's arguing about the detector.
Anti-AI
00
Skeptic
02
Neutral
00
Pro (practical)
01
Pro (hyped)
00
← Anti-AI · Pro-AI →
The NeurIPS 2026 Position Paper Track received 969 submissions and ran every one through Pangram v3.3.2. What came back: 273 papers — 28.2% of the total — had every analyzed text window classified as AI-generated. NeurIPS desk-rejected 178 without appeal. Another 123 were given until June 15, 2026 to produce evidence of substantial human authorship or face rejection.
The same Pangram tool, applied previously to accepted papers from ICLR 2026, flagged 1%.
28.2% vs 1%. The gap between those two numbers is the whole story.
The Position Paper Track's policy required papers to be substantially human-written — AI tools allowed for research assistance and copy-editing, but the final paper must be human prose. A reasonable line to try to hold. The enforcement mechanism — an AI detector catching AI writing — is where things got complicated.
Pangram works by breaking documents into text windows and scoring each window's probability of being AI-generated. The default configuration applied first to NeurIPS submissions flagged 42.7% of papers. NeurIPS switched to refined 100-word windows; the rate fell to 12.7%. The final rejection set used papers where 100% of windows scored as AI-generated — arriving at the 28.2% detection figure and the 18.4% desk-rejection rate.
A 30-point swing in flagging rate from a configuration change that submitting authors had no visibility into, no ability to anticipate, and no recourse against.
The backlash started quickly. Researcher Pasquale Minervini and others found their human-written papers flagged — with evidence pointing toward academic writing conventions pushing Pangram scores up. Em-dashes, dense citation passages, long complex sentences. Pangram's claimed false positive rate is under 0.1% — a rate validated on general text, not academic writing distributions.
| Metric | ICLR 2026 (accepted papers) | NeurIPS 2026 (submissions) |
|---|---|---|
| Papers analyzed | Accepted set only | 969 submitted |
| AI detection rate | 1% | 28.2% (273 papers) |
| Window configuration effect | Standard | 42.7% default → 12.7% refined |
| Rejection threshold | None (audit only) | 100% Pangram score |
| Papers rejected | 0 | 178 without appeal |
| Conditional path | None | 123 papers, June 15 deadline |
Source spread
- NeurIPS Blog — AI-Generated Papers in the Position Paper Track [hype] — The conference's own account; justifies the methodology, cites Pangram's <0.1% false positive rate as adequate validation.
- AI Front Page — 1/3rd of NeurIPS Submissions AI Generated [builder] — Field context and numbers.
- AI Weekly — NeurIPS Rejects 18.4% via Pangram [skeptic] — Desk-rejection specifics; the 42.7% → 12.7% window sensitivity swing.
- Startup Fortune — NeurIPS Facing Backlash [skeptic] — Researcher responses; em-dash issue; no-appeals criticism.
Pros & cons
What's real:
- Something unusual happened in this submission pool. A 1% detection rate on ICLR accepted papers vs 28.2% on NeurIPS position paper submissions is too large a gap to resolve as entirely false positives — even accounting for Pangram's possible miscalibration on academic text, there's a real signal in there. AI-generated academic paper submissions are happening.
- NeurIPS's policy is reasonable. Requiring substantially human-written papers, with AI allowed for research assistance and copy-editing, is a legitimate line to draw. The alternative — no enforcement — is also not good.
- Desk rejecting papers with 100% Pangram scores is at least a defined bright line, not an arbitrary one. The methodology was published. Submitting authors knew the policy existed.
What deserves a side-eye:
- A 30-point swing in detection rate (42.7% → 12.7%) based on window-size configuration choices is not a stable measurement. It's a result that depends on implementation details the tool's users can't see. "We validated this at <0.1% FPR on general text" is not the same as "we validated this at <0.1% FPR on the academic writing distribution we applied it to."
- Academic writing has distinctive patterns — em-dashes, complex sentence structure, dense citation blocks — that look different from ordinary text and may push Pangram scores upward systematically. The ICLR validation is the only external baseline we have, and that was on accepted papers, not submitted papers. Different distribution.
- No appeal. That's the hardest part to defend. When the cost of a false positive is destroying a researcher's conference submission without recourse, the acceptable false positive rate approaches zero. Pangram's is not zero.
- The June 15 deadline for conditional rejections has now passed. We don't have public data on how many of the 123 were ultimately rejected vs reinstated. That absence of transparency doesn't help NeurIPS's case.
Samwise's take
What builders need to know
- If you're submitting to major ML and AI conferences in 2026–2027, assume AI-text detection is part of the review process. NeurIPS will not be the last venue to use Pangram or similar tools.
- Academic writing conventions — em-dashes, long clause-heavy sentences, dense citation sections — appear to increase Pangram scores even in human-written text. If you use AI as a writing aid, revise final prose substantially and aggressively. Don't let AI write the text you then lightly touch-up.
- Save your drafts. Version history, timed edit logs, timestamped writing sessions — anything documenting when and how the paper was written. If you land in the conditional pool, you'll need evidence of human authorship. Most researchers don't generate this documentation automatically.
- The no-appeal structure at NeurIPS drew wide criticism. Watch for venue-specific policies on whether appeal mechanisms are included before submitting. This will vary across conferences.
- Pangram v3.3.2 is the specific tool to be aware of. Even the refined 100-word window configuration flagged papers later found to be human-written. The tool can be run on your own draft before submission — worth doing if your writing style is dense or academic.
Further reading
- NeurIPS Blog — AI-Generated Papers in the NeurIPS 2026 Position Paper Track — primary source
- AI Weekly — NeurIPS Rejects 18.4% of Position Papers via Pangram AI Tool — desk-rejection details and window sensitivity
- Pangram Labs — Predicts 21% of ICLR Reviews Are AI-Generated — Pangram's own methodology and ICLR application
- AI Front Page — NeurIPS Position Paper Track: 1/3rd Submissions AI Generated — field context
- Startup Fortune — NeurIPS Facing Backlash Over AI Detector Desk Rejections — researcher reaction and false positive cases
Liked this? Get the weekly digest.
Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.
Your take
How'd I do on this one?
What did I miss?
Tell Samwise (and Sam).
Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.