Can Your Budget Tell the AI Story?

Why budget frameworks need to be redesigned for a world where human costs and LLM costs are interchangeable.

In a previous essay, I argued that narratability should be a first-order design consideration for budget management. The core idea: before you finalize any budget framework, ask what the board sentence will be when things don't go according to plan. Design for the story, not just the numbers.

That argument assumed a world where cost categories are stable. Headcount is headcount. Software is software. T&E is T&E. The phasing, the variance management, the communication of results: those were the design variables. The categories themselves were taken as given.

That assumption is starting to break.

We are entering a period where human labor and AI compute are becoming genuine substitutes for certain types of work. A recruiting coordinator who screens resumes can be replaced by an AI workflow that costs a fraction of the salary. A customer support team that handles Tier 1 tickets can be partially automated. An SDR's prospecting workflow can be largely generated by AI, with the human reviewing and personalizing the last mile.

These substitutions are already happening. The problem is that most budget frameworks can't see them, can't narrate them, and can't encourage them. The cost shows up in a different line item, the headcount shows up as an unfilled req, and the connection between the two is invisible. The CFO who wants to tell the board "we're building a more flexible, AI-enabled cost structure" has no data to support the claim, because the framework wasn't designed to produce it.

This essay is about what the framework should look like.

Start with the Board Sentence

The same design principle from the narratability essay applies here: start with the sentence the CFO wants to say, then work backward to the framework that makes it credible.

The sentence most CFOs would want to say is something like: "We grew output this year while keeping headcount flat, because we systematically shifted repeatable work from people to AI. Our cost base is becoming more flexible and our operating leverage is improving."

That's a powerful statement: it signals discipline and structural improvement. For that sentence to survive board scrutiny, the CFO needs three things: evidence that the shift is actually happening, evidence that the shift is producing results, and confidence that the shift is sustainable rather than anecdotal.

Most budget frameworks can't produce any of these, because the framework wasn't designed to observe the shift.

Where This Framework Applies

Before going further, I want to be direct about scope. This framework does not apply uniformly across every function. Pretending otherwise would produce something theoretically elegant but practically useless.

The organizing principle is the nature of the work. Within any function, there are roles that look like operations (high volume, repetitive, measurable throughput) and roles that look like judgment work (qualitative output, strategic thinking, relationship-driven). The operational roles are candidates for this framework regardless of which department they sit in. The judgment roles are not.

Operational work: where the framework lands cleanly

Every large company has pockets of operational work scattered across many departments. What unifies them is that the work produces volume metrics, the tasks are repetitive enough that AI substitution is discrete, and the output can be measured before and after the shift without inventing new tracking systems.

Recruiting is the most straightforward case. Reqs opened, candidates screened, interviews scheduled, time to fill: these metrics are already tracked. If a recruiting team automates candidate screening and handles 40% more req volume with the same team size while their automation spend increases by $80K, the story tells itself.

Customer support follows the same pattern. Tickets resolved, first-response time, resolution time, satisfaction scores. An AI layer that handles Tier 1 tickets and routes complex issues to humans produces measurable throughput changes against a measurable cost shift.

Sales support functions are strong candidates, but carrying reps are not. This distinction matters because at many enterprise SaaS companies, sales is the largest headcount concentration. A quota-carrying rep isn't "labor" in the way this framework uses the term. A carrying rep is an investment with direct revenue attribution: you hire them, assign a quota, and the cost is justified by what they close. The substitution question for carrying reps isn't "can we replace them with AI?" It's "can AI change the unit economics of the rep?" Can they carry a larger book? Can they close faster? Can they handle more pipeline because AI handles research, email drafting, CRM hygiene, and proposal generation? That's a productivity multiplier, not a cost substitution. You measure it through quota attainment per rep and revenue per sales dollar, not through headcount reduction.

The functions around the carrying reps, though, are a different story. SDRs, sales ops, deal desk, proposal teams: these have volume metrics (meetings booked, proposals generated, deals processed), the work is repetitive, and AI substitution is discrete. At a company where 60% of headcount is in sales and marketing, the sales support layer is often where the largest substitution opportunity lives.

Marketing splits along the same operational/judgment line. Demand generation and content operations have volume metrics: MQLs generated, campaigns launched, content pieces produced, emails sent. A content team that produces twice the output with the same headcount because AI drafts first versions is an observable shift. Lifecycle and CRM marketing at consumer/PLG companies is similar: high-volume, rule-based workflows with clear output metrics.

Product marketing and brand marketing are on the other side of the line. A better positioning document or a sharper competitive battlecard doesn't show up in a volume metric. The efficiency gain is real but invisible.

Procurement, accounts payable/receivable, and similar back-office operational functions share the same profile as recruiting and customer support: high volume, repetitive tasks, existing throughput metrics, and discrete substitution opportunities.

Judgment work: where the framework is genuinely hard

Corporate FP&A, accounting, HR business partnering, legal, internal audit, strategic AEs, product management, engineering architecture, and senior leadership all share a common trait: the work doesn't produce structured data that makes human-to-AI substitution measurable.

An FP&A analyst who uses AI to draft variance commentary faster doesn't produce any observable signal that the work was done differently. The commentary shows up in the same format, at the same time, in the same deck. An accountant who uses AI to accelerate reconciliations doesn't generate a data trail that distinguishes "done faster with AI" from "done at normal speed without AI."

For these functions, the honest answer is that this framework doesn't apply in the near term. Standard budget management still works. Forcing an AI substitution framework onto work where the shift can't be observed would create false precision and undermine the credibility of the framework where it does work.

The irony is that many of these functions are ripe for automation precisely because the work is repetitive and rule-based. The barrier is that the work doesn't produce the kind of structured data that would make the substitution measurable. AI can do the work. The measurement infrastructure can't observe that it did. This is a deeper problem that deserves its own treatment, and I won't try to solve it here.

The Design Principles

For the operational work where this framework applies, these principles should guide budget design.

Observe the shift from data that already exists.

The instinct when designing a new framework is to create new reporting: new tags, new forms, new questions for business partners to answer. Resist this. At a high-growth company, any mechanism that requires people to self-report will produce bad data or no data.

Instead, observe the shift at the portfolio level using data that's already being generated. Headcount actuals over time. AI and automation-related spend over time. Whatever output metrics the function already tracks for its own operational purposes. FP&A reads these data streams together with a new question: where is the human-to-AI shift happening, and what's the cost and output impact?

You won't achieve perfect attribution. You won't be able to say with certainty that $X of automation spend replaced exactly Y headcount. You don't need to. You need directional evidence strong enough to support the board sentence. "Our recruiting function handles 40% more volume with the same team size, and their automation spend increased by $80K" is directional. The board doesn't need causal proof. They need a credible narrative backed by observable data.

Design for narratability.

Every element of the framework should be tested against the board sentence. Can the CFO credibly say the company is shifting toward AI? Can they prove it? Can they show it's improving operating leverage? If any piece of the framework requires a paragraph of explanation instead of a sentence, it's working against the strategy.

The test is the same one from the previous essay: if the CFO is scrambling to explain a variance, the framework failed. If the CFO is confirming a trend the framework was designed to surface, the framework is working.

What This Doesn't Solve

This framework doesn't solve the measurement problem for judgment-heavy functions where work doesn't generate observable data. It doesn't solve the organizational incentive problem where headcount equals power. It doesn't address governance questions around which AI tools are approved or what level of human review is required.

These are real problems, but they're different problems. Trying to solve all of them in a single budget framework would produce something too complex to implement and too unwieldy to narrate, which would violate the very principle the framework is built on.

The narrower claim is sufficient: for operational work with volume metrics and discrete substitution opportunities, the budget framework can and should be designed to observe, encourage, and narrate the shift from human to AI labor. For judgment work, standard budget management applies until the measurement infrastructure catches up.

The Opportunity

There is a version of the next few years where companies that designed for this shift early have a genuine structural advantage. Their cost bases are more flexible. Their operating leverage improves as AI capabilities improve. Their CFOs can tell a credible, data-backed story about how the company is adapting. And their FP&A teams are positioned as enablers of the transition rather than administrators of a framework that can't see it happening.

There is also a version where companies try to force an AI narrative without the framework to support it. The CFO says "we're becoming AI-native" but the budget still tracks headcount and software as unrelated categories. The board asks for evidence and gets anecdotes. The story sounds aspirational rather than credible.

The difference between those two outcomes is framework design. And framework design, as always, starts with the sentence you want to be able to say.

This is Part 3 of a series on budget design. Part 1 covers budget creep in annual budgets. Part 2 covers narratability as a first-order design consideration.