The Hollowing: What Klarna Learned, What Block Is About to

Klarna replaced 700 customer-service agents with AI and quietly hired them back. Block has just cut four thousand jobs in service of an "intelligence, not a hierarchy" thesis. The mechanism is the same in both cases: when authority transfers to the system before authorship is captured from the people, the knowledge that made the work good leaves with them.

Two cases, one mechanism

Sebastian Siemiatkowski told Bloomberg in May 2025 that Klarna had gone too far. Eighteen months earlier he had announced that AI "can already do all of the jobs that we, as humans, do," frozen hiring across the company, and replaced approximately 700 customer-service agents with an OpenAI-powered chatbot. The first month produced 2.3 million conversations, average resolution time fell from eleven minutes to two, and the cost line dropped by forty per cent. By the time he spoke to Bloomberg the company was hiring human agents back. The reason, stated with uncommon directness, was that "cost unfortunately seems to have been a too predominant evaluation factor when organizing this, what you end up having is lower quality."

The Klarna case is the retrospective one. Block is the unfolding one. Jack Dorsey has cut around four thousand roles and named the thesis explicitly: the company should operate as an "intelligence," not a hierarchy. No middle management; coordination driven by outcomes. The structural instinct is not stupid. Anyone who has spent two decades inside transformation programmes has watched them stall in the middle layers, where resistance is not malicious but deeply human — people preserving the structures that gave them influence, optimising for their own survival rather than the company's objective. That is a real problem, and we have written about it elsewhere on this blog (under the Crozier–Pfeffer–Mintzberg reading) at some length.

But the answer is to work with those people, not to replace them. Performance still matters and some roles genuinely will not survive the transition. The people in those layers, though, hold the context, the judgment, the relationships, and the institutional memory that an "intelligence" actually needs to function. Remove them and you do not get a leaner organisation. You get a hollowed-out one.

The Klarna failure mode is exactly this. The chatbot could resolve a straightforward return in two minutes. It could not recognise when two minutes of scripted answers would push a loyal customer to a competitor — the medical emergency dressed as a payment dispute, the merchant-fraud claim, the billing error compounded by an account freeze. The Better Business Bureau logged more than 900 complaints in three years, concentrated in exactly the categories where the gap between automated processing and human judgment is widest. The headline metric improved. The metric that mattered — the customer's confidence that the company would act on their behalf with judgment — degraded. In financial services, that confidence is the product.

What actually got lost

The thing we keep treating as "the customer-service function" is two things. One is the workflow: receive a query, classify it, route it, resolve it. That part is well-understood, well-documented, and broadly susceptible to automation. The other is the layer that nobody wrote down: what good service meant in practice. The judgment calls. The tone. The exceptions. The way an experienced agent recognised when a billing question was actually a hardship question, and the way the company chose to handle that. That layer lived in the people doing the work. When the humans were gone, that knowledge was gone with them.

This is the mechanism that the Authored-Use frame names: authority transferred to the system before authorship was captured from the people who held it. The transfer is the easy part — connect the model to the workflow, retire the cost centre. The capture is the hard part, and it is the part Klarna skipped. The reversal is not a verdict on the technology. It is a verdict on a deployment that mistook the workflow for the work.

Block is in the middle of the same gamble at much larger scale. The thesis — flatten the structure, let the model coordinate — works only if the judgment, context and relationships that lived in the layer being flattened were captured first. If they were captured, the experiment is interesting. If they were not, the next two years will look like Klarna, only across a whole company and without the easy partial reversal.

The Princeton calibration

There is a number worth holding here, and the place it shows up is the one that matters most for the Klarna case. A 2026 Princeton paper — Rabanser, Kapoor, Narayanan and colleagues, Towards a Science of AI Agent Reliability — evaluated fourteen frontier models across eighteen months of releases on two benchmarks. On GAIA, the general agentic benchmark, reliability improved at roughly half the rate of accuracy. On τ-bench, the customer-service benchmark, reliability improved at roughly one-seventh the rate. The customer-service ratio is not the universal ratio; it is the ratio in exactly the work that Klarna automated and Block's thesis would automate further. Capability is racing ahead of reliability fastest in the kind of work where a wrong answer is most expensive to recover from.

That ratio is the engineering reality behind the human-judgment problem. If you are planning an organisation-wide redesign around this technology, you need people who understand the work deeply enough to catch what the agents get wrong — not occasionally, not at a steady-state failure rate, but across the long tail of cases the benchmark could not anticipate. Those people are exactly the layer the Block thesis proposes to remove.

The current industry response to this gap is enforcement: content safety filters, jailbreak detection, agent security products, attack-surface mapping. All of it keeps the model from doing what it should not. None of it helps you design what the model should do, or how the people around it should work differently. The Authored-Use brochure is our argument for the missing piece — the operational craft to direct the technology, which lives in people and cannot be filtered into existence.

What this looks like in practice

Any company of meaningful size has hundreds of teams working across multiple dimensions of this question at once, moving at different speeds depending on context. The right move is not a programme. It is a method: pick a high-value workflow, work with the team that owns it to understand what the work actually is, capture the judgment that has not been written down, design the model into the workflow with the people who hold that judgment still in the loop, and move to the next workflow. Together.

The companies that pull this off will not be the ones with the most sophisticated agent infrastructure. They will be the ones that understood the work and valued the people doing it before they automated it. Even the Dorsey redesign works better when people are part of it rather than casualties of it. The technology is here. The organisational craft to direct it is what is missing. And that craft lives in people.

Sources

Sebastian Siemiatkowski / Bloomberg, "Klarna Turns Back to Humans After AI Push," Bloomberg, May 2025 — the "we went too far" interview and the explicit admission that cost had become "a too predominant evaluation factor."
Klarna Q4 2024 Investor Update — the original 700-agent / 2.3-million-conversation / 11-to-2-minute resolution-time figures.
Better Business Bureau — more than 900 complaints against Klarna across a three-year window, concentrated in refund and billing categories.
Block / Jack Dorsey — the "intelligence, not a hierarchy" memo and the associated reduction of approximately four thousand roles (2026).
Stephan Rabanser, Sayash Kapoor, Peter Kirgis, Kangheng Liu, Saiteja Utpala, Arvind Narayanan (Princeton), "Towards a Science of AI Agent Reliability," arXiv:2602.16666, February 2026 — the comparison of fourteen frontier models over eighteen months of releases from which the GAIA (~1:2) and τ-bench (~1:7) capability-to-reliability scaling ratios are drawn. See also Kapoor & Narayanan's Normal Tech announcement post and Fortune's March 2026 coverage.
TwinLadder Casebook, "We Went Too Far" — Klarna and the Cost of Replacing Human Judgment, February 2026 — the long-form case study from which the Klarna narrative in this post is summarised.
TwinLadder, Towards Authored Use — Making the Human Visible in an AI-Saturated Workplace, 2026 — the framework that names the mechanism the two cases share.

The Hollowing: What Klarna Learned, What Block Is About to

The Hollowing: What Klarna Learned, What Block Is About to

Two cases, one mechanism

What actually got lost

The Princeton calibration

What this looks like in practice

Sources

The Apprenticeship Is Breaking — and Almost Nobody Is Saying So

Where Does the Company Remember? Institutional Knowledge in the Age of AI

When AI Enters the Room, Your Best Thinking Leaves

The AI Training Market Is Broken — Here's What Legal Professionals Actually Need

The Broken Learning Ladder: AI Is Removing the Work That Built Expertise