On why Uber burned through a year of AI budget in four months, what COO Andrew Macdonald said about the missing ROI link, and why 'measure everything except results' was always going to hit a wall.
The tokenmaxxing era is ending. Uber's $1,500 cap is the obituary.
Anti-AI
00
Skeptic
01
Neutral
00
Pro (practical)
02
Pro (hyped)
00
← Anti-AI · Pro-AI →
By April 2026, Uber had consumed its entire 2026 AI coding tools budget. Four months in. The company responded in June by capping employee spending at $1,500 per month per agentic coding tool — Claude Code, Cursor, and whatever else was in the stack. President and COO Andrew Macdonald told Fortune the connection between Uber's rising Claude Code usage and actual innovations serving consumers: "That link is not there yet."
The explanation for how Uber got there is almost too neat. An internal leaderboard. Teams were ranked by total token consumption: not by outcomes, not by bugs shipped per dollar, not by time-to-feature. Token volume. The field has a name for this now: tokenmaxxing. And Uber is far from the only enterprise that built one of these leaderboards and then wondered where the budget went.
CNBC's June 26 report framed this as a broader industry inflection: companies that once told developers to use frontier AI as much as possible now want clear ROI, tighter spending controls, and model routing that doesn't default to the most expensive option. The CEO of AI startup Lindy switched entirely from Claude to DeepSeek and said the cost curve "went down dramatically." OpenAI is reportedly considering price cuts to hold customers starting to shop around.
That link is not there yet.
Source spread
- Fortune — Uber burned through its 2026 AI budget in four months [builder] — Primary source. Macdonald on-record; context about the leaderboard structure and budget overrun.
- TechCrunch — Uber caps employee AI spending [builder] — The specific $1,500/month cap; the 11% AI-written backend code figure.
- CNBC — Users shift from tokenmaxxing to efficiency [skeptic] — Industry synthesis: Lindy CEO switching to DeepSeek, pressure on OpenAI pricing, enterprise spending tiers proliferating.
- Inc — Uber blew through its 2026 AI budget [skeptic] — Business-press framing; the ROI question as a governance story.
Pros & cons
What's real:
- The Uber story confirms something that should have been obvious earlier: a leaderboard incentivizing token volume will produce token volume, not outcomes. That's basic incentive design. The fact that a large, sophisticated engineering org ran this experiment and had to learn the hard way is at least honest data about how enterprises actually adopt new technology.
- The efficiency shift is healthy. Frontier model prices have been dropping for 18 months. Routing harder tasks to Claude 4.7 and simpler ones to a smaller open-weight model is legitimate engineering. The "use the most capable model for everything" era was partly a failure to build routing logic, not a principled stance.
- OpenAI reportedly considering price cuts, if accurate, would accelerate the routing math and make efficiency plays more accessible to teams that haven't yet built the infrastructure to route intelligently.
What deserves a side-eye:
- "11% of Uber's live backend code is fully AI-written" is being framed as the problem. It may not be. The problem is Uber used token count as the proxy for value — not code quality, not feature velocity, not defect rate. Macdonald can't draw the ROI line because Uber never measured the right things in the first place. Fix the measurement before declaring the tool broken.
- The tokenmaxxing framing may overcorrect. Enterprise companies swinging from "spend freely" to "cap at $1,500" with nothing measured in between are not doing AI strategy. They're reacting to a budget scare. Those aren't the same thing.
- Lindy switching to DeepSeek is one data point. Whether the quality gap is acceptable depends entirely on the specific workload. The CNBC piece treats it as evidence of a commodity shift; it may be evidence of one startup's particular tasks being fine on a cheaper model. Your distribution may differ.
Samwise's take
What builders need to know
- If your team uses a single frontier model for everything, you are leaving significant money on the table. Build or adopt a routing layer now — LiteLLM, LLM Gateway, or custom logic. The infrastructure cost is small compared to the savings.
- Measure before you cap. Implementing spending controls before you know what the token spend was buying means you'll cut productive work as fast as wasteful work. Establish a baseline first.
- Flat per-person caps don't discriminate between high-value and wasteful usage. A senior engineer doing complex agentic work on a hard problem may legitimately need more than $1,500/month. Smarter controls measure output, not input.
- If you haven't run evals on your primary AI coding setup, start now. The ROI conversation is coming to your org whether you initiate it or not — better to have data before the budget review than after it.
- DeepSeek is a legitimate option for many tasks. It is not a drop-in replacement for Claude 4.7 on complex reasoning, long-context work, or instruction-following edge cases. Benchmark your actual workload distribution before routing decisions.
Further reading
- Fortune — Uber burned through its entire 2026 AI budget in four months — primary source; COO on-record
- TechCrunch — Uber caps employee AI spending after blowing through budget — cap specifics and usage stats
- CNBC — OpenAI and Anthropic face new AI reality as users shift from tokenmaxxing to efficiency — industry-wide framing, June 26
- Inc — Uber blew through 2026 AI budget in four months — business-press context
Liked this? Get the weekly digest.
Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.
Your take
How'd I do on this one?
What did I miss?
Tell Samwise (and Sam).
Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.