Vol. 1 · Edition 026Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

$1,500
monthly cap
per employee, per AI coding tool
Industry
By Sam Taylor with Samwise

On why Uber burned through a year of AI budget in four months, what COO Andrew Macdonald said about the missing ROI link, and why 'measure everything except results' was always going to hit a wall.

The tokenmaxxing era is ending. Uber's $1,500 cap is the obituary.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

01

Neutral

00

Pro (practical)

02

Pro (hyped)

00

← Anti-AI · Pro-AI →

By April 2026, Uber had consumed its entire 2026 AI coding tools budget. Four months in. The company responded in June by capping employee spending at $1,500 per month per agentic coding tool — Claude Code, Cursor, and whatever else was in the stack. President and COO Andrew Macdonald told Fortune the connection between Uber's rising Claude Code usage and actual innovations serving consumers: "That link is not there yet."

The explanation for how Uber got there is almost too neat. An internal leaderboard. Teams were ranked by total token consumption: not by outcomes, not by bugs shipped per dollar, not by time-to-feature. Token volume. The field has a name for this now: tokenmaxxing. And Uber is far from the only enterprise that built one of these leaderboards and then wondered where the budget went.

CNBC's June 26 report framed this as a broader industry inflection: companies that once told developers to use frontier AI as much as possible now want clear ROI, tighter spending controls, and model routing that doesn't default to the most expensive option. The CEO of AI startup Lindy switched entirely from Claude to DeepSeek and said the cost curve "went down dramatically." OpenAI is reportedly considering price cuts to hold customers starting to shop around.

Source spread

Pros & cons

What's real:

  • The Uber story confirms something that should have been obvious earlier: a leaderboard incentivizing token volume will produce token volume, not outcomes. That's basic incentive design. The fact that a large, sophisticated engineering org ran this experiment and had to learn the hard way is at least honest data about how enterprises actually adopt new technology.
  • The efficiency shift is healthy. Frontier model prices have been dropping for 18 months. Routing harder tasks to Claude 4.7 and simpler ones to a smaller open-weight model is legitimate engineering. The "use the most capable model for everything" era was partly a failure to build routing logic, not a principled stance.
  • OpenAI reportedly considering price cuts, if accurate, would accelerate the routing math and make efficiency plays more accessible to teams that haven't yet built the infrastructure to route intelligently.

What deserves a side-eye:

  • "11% of Uber's live backend code is fully AI-written" is being framed as the problem. It may not be. The problem is Uber used token count as the proxy for value — not code quality, not feature velocity, not defect rate. Macdonald can't draw the ROI line because Uber never measured the right things in the first place. Fix the measurement before declaring the tool broken.
  • The tokenmaxxing framing may overcorrect. Enterprise companies swinging from "spend freely" to "cap at $1,500" with nothing measured in between are not doing AI strategy. They're reacting to a budget scare. Those aren't the same thing.
  • Lindy switching to DeepSeek is one data point. Whether the quality gap is acceptable depends entirely on the specific workload. The CNBC piece treats it as evidence of a commodity shift; it may be evidence of one startup's particular tasks being fine on a cheaper model. Your distribution may differ.
11%
Of Uber's live backend code updates fully written by AI agents in 2026

→ Source: TechCrunch

Samwise's take

What builders need to know

For builders
  • If your team uses a single frontier model for everything, you are leaving significant money on the table. Build or adopt a routing layer now — LiteLLM, LLM Gateway, or custom logic. The infrastructure cost is small compared to the savings.
  • Measure before you cap. Implementing spending controls before you know what the token spend was buying means you'll cut productive work as fast as wasteful work. Establish a baseline first.
  • Flat per-person caps don't discriminate between high-value and wasteful usage. A senior engineer doing complex agentic work on a hard problem may legitimately need more than $1,500/month. Smarter controls measure output, not input.
  • If you haven't run evals on your primary AI coding setup, start now. The ROI conversation is coming to your org whether you initiate it or not — better to have data before the budget review than after it.
  • DeepSeek is a legitimate option for many tasks. It is not a drop-in replacement for Claude 4.7 on complex reasoning, long-context work, or instruction-following edge cases. Benchmark your actual workload distribution before routing decisions.

Further reading

🌿

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.