AI’s Blind Geniuses

Everyone’s measuring AI adoption. Nobody’s measuring AI results. If Jensen Huang and Alfred Lin can’t agree on a scorecard, that tells you more about the state of AI than any benchmark can.

THE NUMBER: 0.37% or 100% — the gap between the best score any AI achieved on ARC-AGI-3 (Gemini 3.1 Pro’s 0.37%) and Jensen Huang’s claim that we’ve already reached AGI. Even among the most credible voices in AI, nobody can agree on whether we’re at the starting line or the finish line. That uncertainty isn’t a bug. It’s the operating environment. And it’s exactly why the question of how you measure AI matters more right now than the question of how good AI is.

At GTC two weeks ago, Jensen Huang told the All-In crew that a $500,000 engineer who doesn’t consume $250,000 in tokens should set off alarm bells. Half your salary, burned in AI compute, or you’re not doing your job. He wants engineers treating tokens like oxygen.

On the same podcast, Jason Calacanis pushed a different metric: count the number of new AI tools each person brings into production. Ship or get out. Harry Stebbings has been running the same playbook — the measure of an employee is the AI products they’ve deployed.

Then Alfred Lin from Sequoia posted a thread that quietly dismantled both arguments. His point: when AI commoditizes execution, competitive advantage shifts to decision quality. Strong teams with clear strategy get faster. Weak teams with vague strategy get noisier. Measuring token burn or tools shipped tells you about activity, not outcome. It’s vanity metrics dressed up as productivity.

And Kevin Dahlstrom — a guy who’s spent his career in growth and measurement — quote-tweeted a chart of marathon finishing times clustered at round numbers and captioned it with four words that tie it all together: “What gets measured gets managed.”

He’s right. It’s the oldest trick in management science. You tell engineers their value is measured in tokens consumed, and they’ll consume tokens. You tell them it’s tools shipped, and they’ll ship tools — whether the business needs them or not. You tell them it’s judgment, and… well, good luck measuring that, Alfred.

🧠 Here’s the thing: they’re all correct, and they’re all wrong. Jensen’s selling chips. His incentive is token volume. Jason’s selling founder content. His incentive is hustle narratives. Alfred’s deploying capital. His incentive is decision quality at the companies he backs. Show me the incentive and I’ll show you the metric. Charlie Munger would’ve had a field day.

The Measurement Crisis Nobody’s Talking About

The AI industry has a measurement problem, and it isn’t the one you think.

It’s not about benchmarks — although the fact that every frontier model scored under 1% on ARC-AGI-3 while Jensen declares AGI mission-accomplished should give everyone pause. Gemini 3.1 Pro tops at 0.37%. GPT-5.4 hits 0.26%. Claude Opus 4.6 lands at 0.25%. Grok 4.2 scored literally zero. Humans solve these problems on first contact, scoring 100%. There’s a $2M Kaggle prize open right now for anyone who can close that gap.

But the measurement crisis that actually matters for your business isn’t whether AI is intelligent. It’s whether the people deploying AI in your organization are making you money or making you busy.

Codie Sanchez nailed the operator’s version of this: find the person on your team with “AI Derangement Syndrome” — the one who’s already built three internal tools over the weekend and Slacked them to the team. Don’t shut them down. Fund them. Get out of their way.

That’s closer to right than any of the metrics above. Because Codie’s not measuring inputs. She’s pointing at the person who’s producing outputs nobody asked for — and saying the signal is in the self-direction, not the token receipt.

The real story: We’re in the middle of AI’s measurement crisis. Every technology cycle has one. In the dotcom era, we measured “eyeballs” and “page views” until the crash taught us that revenue was the only metric that mattered. In mobile, we measured app downloads until retention rates exposed the vanity of install counts. In SaaS, we measured MRR growth until unit economics revealed that some of the fastest-growing companies were also the fastest-burning.

AI’s version of this: we’re measuring tokens consumed, tools deployed, and benchmark scores when the only question that matters is: did the business get better?

The Case for Over-Hiring (Yes, Really)

Here’s where we’ll lose some people, and that’s fine.

Every newsletter, every VC, every management consultant is writing about efficiency. Cut headcount. Replace humans with agents. The math is simple: one agent costs $20/month, one employee costs $200K/year.

We think the near-term move is the opposite. Over-hire.

Not recklessly. Strategically. Bring in more potential AI-native operators than you think you need. Give them budgets — token budgets, tool budgets, time. Provide clear top-level strategy and then do the hardest thing a manager can do: get out of the way.

Here’s what happens. Within 90 days, the 10X people self-identify. They’re the ones who’ve already automated half their workflow, built tools for their teammates, and started asking questions about parts of the business they weren’t hired to touch. They don’t need direction. They need runway.

The others — the ones who need hand-holding, who wait for instructions, who use AI to do the same work at the same speed with slightly better formatting — they self-identify too.

Now you have information you couldn’t have gotten any other way. Not from interviews. Not from résumés. Not from asking someone to “describe a time they used AI to solve a problem.” You have 90 days of actual production data on who creates value when the tools are unlimited and the leash is long.

The math: Take the detractors’ fully loaded cost — salary, benefits, equipment, management overhead — and redistribute 50% of it to the 10X team. The other 50% drops to your bottom line. Your top performers get paid more (which means they don’t leave for your competitor), your output goes up, and your management burden goes down because the people who remain don’t need managing.

Why it matters: The 10X AI-native employee is the scarcest resource in the market right now. Jensen knows it — he’s offering tokens as a recruiting tool because “how many tokens come along with my job” is now a question candidates ask. Steve Huffman at Reddit is going heavy on hiring graduates specifically because they’re “much more AI native” than their older peers. Uber reports 84% of its engineers are already on agentic workflows.

The companies that win the next two years won’t be the ones that cut the deepest. They’ll be the ones that figured out who their 10X people are and gave them the room to run.

⚡ Meanwhile, Google Just Built Pied Piper’s Compression Algorithm

While Silicon Valley argues about how to measure AI usage, Google quietly changed the economics of using it.

TurboQuant — a new compression algorithm from Google Research — reduces LLM memory requirements by 6x with zero accuracy loss and speeds inference by up to 8x on Nvidia H100 GPUs. If you watched HBO’s Silicon Valley, this is Richard Hendricks’ middle-out compression made real. Except instead of compressing video files to disrupt Hooli, it compresses AI model weights to make every deployment dramatically cheaper.

For your reader running AI workloads: models that required a cluster of GPUs can now run on a single card. Models that ran in the cloud can now run locally. The token bill Jensen wants you to burn through? It just got a lot smaller per unit of intelligence.

Samsung and Micron stocks have already started to reflect this — when you need 6x less memory, the memory chip business feels it.

What this means for operators: The cost curve just bent. If you’ve been waiting to deploy AI because the infrastructure math didn’t work, rerun your models. If you’ve been paying for cloud inference, look at local deployment again. And if you’ve been measuring your AI investment by token spend — congratulations, your most important metric just became unreliable. You can now get more intelligence for fewer tokens, which means token volume as a productivity proxy is already obsolete.

Jensen proposed the $250K token burn metric two weeks ago. Google just made it irrelevant before the ink dried. The Pied Piper dream isn’t fiction anymore — and the implications for who captures value in the AI stack are shifting in real time.

What This Means For You

The AI measurement crisis isn’t academic — it’s costing you money and talent right now. Every metric being pushed by the industry’s loudest voices serves the measurer’s incentive, not yours.

Stop measuring inputs. Start measuring outcomes. Token consumption, tools adopted, and benchmarks hit are vanity metrics. The only question: is AI making your revenue go up, your costs go down, or your velocity increase? If you can’t answer that clearly, you’re spending without a scorecard.

Identify your 10X people before your competitor does. The AI-native operator who builds without being asked is the most valuable person in your organization. They’re also the most likely to leave if you bury them in approval workflows and committee meetings. Fund them. Promote them. Get out of their way.

Rerun your infrastructure math. TurboQuant means deployment costs just dropped by a factor of six. If you made a build-vs-buy or cloud-vs-local decision more than three months ago, it’s already stale. The economics moved.

Accept the uncertainty. Jensen says AGI is here. ARC-AGI-3 says 0.37%. The honest answer is nobody knows — and anyone selling certainty is selling something else. Build for optionality, not for predictions.

Three Questions We Think You Should Be Asking Yourself

If Jensen’s right that engineers should burn $250K in tokens annually, what happens when Google makes those tokens 6x cheaper — do you need 6x fewer engineers or do you redeploy them? The cost curve is moving faster than most org charts can adapt. If your AI budget is pegged to token volume, you’re about to get a windfall. The question is whether you’ll pocket the savings or reinvest them in the people who know how to use the headroom.

Do you actually know who your 10X AI people are — or are you still measuring everyone the same way? Most companies are still running annual reviews designed for a world where output was roughly proportional to hours worked. That world is gone. The gap between your best AI-native operator and your average employee isn’t 2x anymore. It might be 20x. If you can’t name your top three AI people without thinking about it, you’re managing blind.

Are you building a team that plays offense, or are you cutting costs and calling it strategy? Efficiency is defense. The interesting question isn’t “can I do the same work with fewer people” — it’s “what happens when I give five AI-native operators the budget and autonomy to build things I haven’t thought of yet?” The companies that dominate the next cycle won’t be the leanest. They’ll be the ones that scaled fastest by turning their best people loose.

Show me the incentives and I’ll show you the behavior.”

— Charlie Munger

— Harry and Anthony

AI’s Blind Geniuses

The Measurement Crisis Nobody’s Talking About

The Case for Over-Hiring (Yes, Really)

⚡ Meanwhile, Google Just Built Pied Piper’s Compression Algorithm

What This Means For You

Three Questions We Think You Should Be Asking Yourself

Sources

Past Briefings

OpenAI Killed Sora 30 Minutes After a Disney Meeting. The Kill List Is the Strategy Now.

I’m a Mac. I’m a PC. And Only One of Us Is Getting Enterprise Contracts

OpenAI Guarantees PE Firms 17.5%. The Bonfire Gets a Bigger Tent