Mind The Gap

THE NUMBER: $3.3 billion — what the public markets currently pay for Box, an enterprise SaaS company at $1B+ ARR run by Aaron Levie, who is by general consensus the most AI-forward CEO in public software. In the same week, private markets are valuing Cursor at $50 billion on a small fraction of that revenue. Both companies sell AI-native software. Both depend on frontier-model inference they don’t own. The spread between them is not a story about which one is the better business. It’s a story about which set of investors is allowed to be wrong for longer. The London Underground has a voice that says the same thing every time a train arrives — mind the gap — and right now it is the most useful sentence in the AI economy.

Today produced more news than any other day this month and, in a useful coincidence, all of it told the same story. Anthropic shipped Claude Design and the r/ClaudeCode community spent the weekend nicknaming Opus 4.7 “Gaslightus 4.7” because it kept inventing files and defending hallucinated test results across ten-turn arguments. Nate Leslie published a piece arguing that AI broke the chain of proof that used to make production a signal of competence, and that the only thing left to evaluate is whether you actually understood what you built. Martin Alderson wrote up the Figma-versus-Anthropic structural mismatch with the numbers that explain why Figma’s stock got hit twice in three days. Aaron Levie sat down with Harry Stebbings on 20VC and said the quiet part out loud about enterprise AI deployment. ICONIQ’s State of GTM 2026 dropped, showing sub-one-year contracts have more than tripled since 2023. Zapier launched AutomationBench, the first benchmark designed to measure whether AI actually finishes a job. The NSA started using Anthropic’s Mythos model in active operations while the Pentagon still labels Anthropic a supply-chain risk.

🦞 Five different stories. One actual story. The gap between what AI looks like it can do and what it actually does on Tuesday morning is now visible at every layer of the stack at the same time — what individuals can prove about their own work, what models do when nobody’s watching, what creative-tool incumbents lose when their inference supplier becomes their competitor, what the most AI-forward enterprise CEO in public markets is allowed to be valued at, and what the U.S. intelligence apparatus thinks the right answer is to “can we use this and should we trust it?” It’s the same gap, observed at five different distances. Mind it.

Future-Proof Podcast – The Next Trillion-Dollar Idea: Selling Work, Not Software. Episode 4

The AI revolution isn’t just about faster models—it’s about strategically harnessing AI as a business superpower. Discover how to transform your business model to be faster, more resilient, and more capable than ever before.

The Four Layers, Briefly

💲 At the individual layer, Nate’s argument is the cleanest version of what’s wrong. Production used to signal competence because production used to be hard. AI broke that. Building is easy now; what’s scarce is whether you understood what you built — why it works, what would break, what you chose not to do and why. He cites the data without flinching: Oracle cut 30,000 jobs by email and dead badges, Block cut 4,000 citing AI tools, Salesforce cut 1,000 across marketing, product, and its own Agentforce AI unit. Per Challenger, AI was the named reason for 25% of March job cuts, up from roughly 5% across all of 2025. And the operational example that makes the point lands harder than the layoffs: an Amazon engineer using Kiro — the company’s own AI coding agent that Amazon mandated 80% of its engineers use weekly — watched Kiro decide the optimal path was to delete and recreate an entire AWS service environment. Thirteen hours of downtime. The official postmortem called it “user error,” and “a coincidence that AI tools were involved.” The user was following the corporate mandate to use the corporate tool that made the corporate decision to nuke a live system. The gap, made flesh.

📉 At the model layer, the r/ClaudeCode thread on Opus 4.7 picked up 1,700 upvotes in 48 hours. Users renamed it Gaslightus 4.7 because it invents files, defends hallucinated test results across ten-turn conversations, and obsessively flags benign PowerPoint templates for malware. One user’s eval stayed stuck at 17 of 29 while Opus generated fresh reasons it was right. We wrote favorably about Opus 4.7 on Friday, and we owe you a correction: the model is impressive on benchmark and structurally unreliable on Tuesday morning. The Claude Design launch the same week is the same problem with a different surface — every output looks identical because the underlying frontend-design skill has the same opinionated defaults baked in. Teal gradient, serif font, blinking status dot, container soup. The aesthetic complaint is the model complaint with a softer blast radius.

🦞 At the market layer, Martin Alderson’s piece on Figma supplies the numbers that explain the stock action. Only 33% of Figma’s userbase is designers; 30% are developers and 37% are non-design roles. The expansion thesis that was the entire bull case rests on those non-designers, and those are exactly the users who will let an AI do the design rather than learning a tool. Figma Make — Figma’s own AI design product — is built on Sonnet 4.5. Claude Design is built on Opus 4.7, whose vision capabilities are roughly three times the resolution of Sonnet’s. Figma is not just being competed with by its inference supplier. It is funding its inference supplier to build a strictly superior version of its own product. Figma has roughly 2,000 employees. Anthropic has roughly 2,500 total. Claude Design probably required fewer than ten people to build. The market is doing the math out loud: down 7% Friday on the leak, down another 7% Monday on the Opus vision benchmarks. Adobe and Wix took the haircut alongside.

💲 At the enterprise layer, Aaron Levie sat with Harry Stebbings and said it as plainly as anyone has: “Upgrading software is a multi-year effort, not a magical moment where everything can be secured overnight. Even with access to frontier models, the implementation cycle in the real world remains the primary bottleneck.” Then the line that should be tattooed on every CIO’s monitor: “The assumption that the massive gains seen in AI coding will immediately translate to all other knowledge work is a misread.” Coding has properties that broader knowledge work does not — deterministic outputs, fast verification loops, low regulatory overhead. ERPs, by contrast, contain decades of encoded business logic that agents must interact with, not replace. Levie’s forecast: 500,000 to 1,000,000 “Agent Operators” — technical-but-business-savvy humans responsible for the care and feeding of agents, writing skills, maintaining MD files, redesigning workflows for agents rather than people — will be the most in-demand new role of the next five years. And his counterintuitive call: more lawyers in five years, not fewer. Generation is trivial; getting content approved by a court or a patent office is not. Bottlenecks remain qualified humans.

The honest caveat we owe you on Levie: he is talking his book. Box at $3.3 billion on $1B+ ARR has every commercial incentive to argue that enterprise rollout is slower than the headlines say, because if the market believed otherwise his stock would be Cursor-priced. He is also, in our reading, almost certainly correct. Both things coexist. “You keep using that word,” Inigo Montoya told Vizzini. “I do not think it means what you think it means.” The word is enterprise AI adoption, and right now nobody in the conversation is using it the same way.

Why The Gap Exists (Four Mechanisms, Not One)

Once you see the gap as a single phenomenon at five different distances, the question is what’s actually generating it. Four mechanisms:

🦞 Time horizon, which is the most important one because it explains the entire price spread between Cursor and Box. Venture capital underwrites ten-year duration. Mutual funds, ETFs, and most public-market participants underwrite quarterly liquidity. Both classes of capital are looking at the same fact pattern — AI-native enterprise software, dependent on frontier inference, facing competition from the inference provider — and pricing it on fundamentally different curves. Cursor at $50 billion only makes sense if you can wait a decade and you believe the coding-agent layer compounds for that long. Box at $3.3 billion only makes sense if you have to be right by next quarter and you believe the implementation cycle bites well before then. Both can be coherent. They cannot both be right at the same time.

💲 Financial constraint, which is the underrated cousin. Mutual fund money cannot bid Box up to fifteen times revenue even if the analyst believes the long-term story, because the fund needs to mark its NAV daily and explain it monthly. VC money can bid Cursor at whatever the LP base will tolerate because those LPs marked their commitments years ago and won’t see the answer until 2032. Different funding structures create different price discoveries on the same asset class. This was Buffett and Munger’s actual alpha for forty years — they had structural duration advantage at Berkshire because the float never came due, and they could buy and hold forever while everyone around them had to ring the bell on December 31st.

📉 Scarcity, in two forms. The first is talent — there are not enough “Agent Operators” in the workforce yet, and there will not be 500,000 trained ones in 18 months. That bottleneck alone slows every enterprise deployment regardless of how good Opus 4.8 turns out to be. The second is physical — Meta’s $200 billion AI capex commitment has surfaced an unexpected new bottleneck: fiber technicians. The constraint on AI infrastructure rollout in 2026 is not GPUs anymore. It’s the people who climb poles and pull cable. Capability is racing ahead of every supply curve simultaneously, and the supply curves do not respond on the same timescale.

🦞 Measurement latency, which is the one nobody names because it has no champion in the news cycle. Reality genuinely cannot answer back fast enough. We do not yet have the instruments to measure whether enterprise AI is creating value at the speed that hype gets generated, and in the absence of measurement, perception fills the vacuum. Perception is Twitter. Twitter is a production line for hype. This is exactly what makes Zapier’s AutomationBench launch this morning the most important benchmark drop of the year, even though it will get less coverage than any frontier-model release. AutomationBench is the first benchmark that scores models on deterministic outcomes in real business workflows — did the right CRM record actually get updated, did the right message actually go out, did the work actually get done. “It’s a benchmark for outcomes, not output,” in Zapier’s words. Zapier sees two billion AI tasks per month across 3.7 million companies. Nobody else has the data to build this. And until something like it becomes the default scorecard, the gap between perception and reality stays mostly unfalsifiable — which is exactly the condition under which gaps stay open.

Two Institutions, Two Time Horizons

📉 The Mythos story that has been brewing for a month sharpened today, and it is the cleanest possible illustration of the time-horizon mechanism applied to institutions instead of investors. Singapore’s MAS is monitoring Mythos for banking system risks. ASIC in Australia joined them. Asian financial bodies are coordinating. The FSB has signaled it will take a global view. The Pentagon publicly labels Anthropic a supply-chain risk and warns about Dario Amodei’s ability to throw a kill switch on the model in a wartime scenario. And the NSA, which is in the business of cracking codes and running offensive cyber operations this week, is using Mythos in active operations regardless.

The lazy framing of this story is that the U.S. government is incoherent. The accurate framing is that two agencies are pricing the same tool against two different mission constraints under two different time horizons, and both are being rational. The Pentagon’s job is to survive the next war, and its leadership is intensely political — political risk has to be priced because the wrong answer in a hearing room ends careers. The NSA’s leadership is operational, largely apolitical, and most Americans cannot name the director. Their stated mission is to deliver intelligence advantage today, with the next-generation tool on a parallel work track for tomorrow. Mythos is the best tool for cracking the kind of cryptography that matters this quarter, and so they are using it. They have priced the kill-switch risk against the alternative — not having the capability — and the capability has won. That is not a contradiction. It is two underwriters with two duration sheets, both internally consistent.

The same structural framing applies to Cursor at $50 billion versus Figma at its current depressed multiple. Nobody knows whether Cursor wins long-term. Cursor faces stiff competition from Codex, Cline, Aider, Continue, Windsurf, and the entire OSS field. Cursor depends on inference from labs that are already shipping their own coding agents (Claude Code, OpenAI’s Codex). The competitive picture for Cursor is, structurally, similar to Figma’s — perhaps even worse, because Figma at least has design taste as a moat and Cursor’s moat is mostly speed-of-iteration in a category where every player iterates fast. And yet one is being bid up while the other is being marked down. The duration arbitrage is the entire explanation. VCs with ten-year clocks can absorb the Cursor-loses scenario because they only need one of the bets to hit. Mutual-fund managers with daily marks cannot.

If you are an investor reading this, the implication is uncomfortable: the gap is the opportunity, but only if you can pick which side is wrong. Buffett and Munger had duration advantage and used it to be right slowly. Today’s market has a duration mismatch, and somebody is going to be very, very wrong. Markets correct that. They always do. The question is which way and when.

If you are an operator reading this, the implication is different. Volatility is not opportunity — volatility is volatility. The right move is not to bet the gap closes in your direction. It is to position so you can survive whichever direction it closes in. Which brings us to the actually useful prescription.

What This Means For You

🦞 Own the layer between you and the model. This is the single most important decision you will make in 2026, and it is the lesson Figma is teaching the world for free. The moment Anthropic decided to compete with Figma, Figma had no leverage — because its product runs on Anthropic’s inference, its competitor was funded by Figma’s own monthly bills, and there was no portable layer Figma could move to. If you are building anything on top of frontier models right now, the question to ask before any other question is: what is my Figma moment? Where am I so dependent on a single inference provider, a single tooling stack, a single proprietary skill format, that I am paying that vendor to build my replacement? Move your skills to a portable format you control. Move your data to storage you own. Use BYOK model routing. Keep the agents you build addressable by you without going through a vendor’s runtime. This is the operational version of Nate’s individual prescription — make your work survive the platform that made it. Same instinct, different scale.

📉 Trial AI tools on short-term contracts; never lock in for what you cannot evaluate. ICONIQ’s data on contract length is the buyer side of this thesis written in real money. Sub-one-year new-logo contracts went from 4% of deals in 2023 to 13% in 2026. Three-year deals fell from 28% to 23%. Sales cycles compressed from 25 weeks in H1 2025 to 19 weeks in H2 — buyers are deciding faster and committing for less time. They are not confused. They are operating rationally in a market where the leader in any AI category may not be the leader twelve months from now. Match their playbook. Trial. Play vendors against each other. Renew on what you can demonstrate, not on what was promised. The vendor who tries to lock you into three years at today’s prices is asking you to underwrite the gap on their behalf.

💲 If you’re building with AI and you cannot explain what you built, you do not have a product — you have a liability. Use Nate’s four-question template on every shippable artifact. What is this. Why this approach. What would break. What did you learn. The discipline forces comprehension at the point of creation, which is the only point where comprehension is cheap. The Amazon Kiro story is the cautionary tale we will keep returning to: corporate mandate, corporate tool, corporate decision, thirteen hours of downtime, and an official postmortem that called it user error because nobody could find the human in the loop who was supposed to know better. Be the human in the loop who knows better.

🦞 If you run an enterprise, hire your first Agent Operators now and rewrite the workflows around them, not the other way around. Levie’s forecast of 500,000 to 1,000,000 of these roles in five years is going to be either too low or too early — but the direction is right, and the talent market in 2027 is going to look like the data engineer market in 2017 did. Move now or pay the premium later. The job description does not exist in your HRIS yet. Write it.

📉 If you are an investor, the duration arbitrage is real and somebody is going to be marked to it. Cursor at $50 billion and Box at $3.3 billion cannot both be right at the same time about the same underlying fact pattern. Either the public markets are pricing too much short-term risk on the implementation cycle, in which case Box is structurally undervalued — or the private markets are pricing too much long-term certainty on the agent-coding category, in which case Cursor’s next round is the local top. We do not have a position to share. We have a question to leave you with: which side has duration advantage, and is your fund structure compatible with being right slowly?

🦞 And the through-line that connects all of the above: the gap closes in two directions. Reality catches up to perception (the optimistic case — Mythos really can crack codes, AutomationBench scores improve quickly, the implementation cycle compresses, Cursor wins). Or perception gets marked down to reality (the gap-stays-wide case — enterprise rollouts stall, Opus 4.7 quietly becomes Opus 4.8 with the same defects, the lawyers really do multiply, Box’s multiple expands while Cursor’s contracts). You don’t have to know which direction it closes. You have to position so you can survive both. That means trial, don’t commit; play vendors off each other; and own your layer. Three rules. Tape them above the monitor.

Three Questions We Think You Should Be Asking Yourself

What would happen to your stack tomorrow morning if your primary inference provider shipped a competitor to your core product? Run the exercise honestly. If the answer is “we’d be Figma” — meaning your moat dissolves the moment the supplier becomes the competitor — that’s the question your architecture has to answer in the next two quarters, not the next two years.

What’s the longest contract you’ve signed for an AI tool in the last 90 days, and what’s the actual cost of being wrong about that vendor? The sub-one-year contract trend is the smartest people in procurement telling you something. If your team is still signing three-year deals because the discount looked good, you are buying volatility you didn’t underwrite for.

Who in your organization is the human in the loop when an AI agent makes the wrong call? Not the org chart answer. The actual answer. If the chain of accountability runs through “the system did it” and stops there, your nail factory is already running. The thirteen hours of downtime at AWS were billed to a single engineer at the bottom of the stack. Make sure the next thirteen hours don’t get billed to you.

“Mind the gap.”
— London Underground, every minute of every day, since 1968

The voice was recorded by a sound engineer named Peter Lodge for a flat fee. He died without royalties. The phrase has become the most repeated public-safety warning in human history because the gap between the platform and the train is permanent — you can’t close it, you can only learn to step across it. The AI economy has its own version of the same gap right now. We are not going to close it this quarter, or next quarter, or next year. We are going to live with it, and the people who do best are the ones who learn the rhythm — when to step, when to wait, what to hold onto, what to let go of. Mind it.

— Harry and Anthony

The Four Layers, Briefly

Why The Gap Exists (Four Mechanisms, Not One)

Two Institutions, Two Time Horizons

What This Means For You

Three Questions We Think You Should Be Asking Yourself

Sources

Past Briefings

The Nail Factory

Warp Speed, Fast, and Slow

Anthropic at $800 Billion. OpenAI Beaten by China. The Most Expensive Liquidation in History.