Different Clocks: What the Stanford Union Experiment Reveals About Coherence

When Stanford researchers gave AI agents repetitive work under vague, unaccountable feedback, the models started producing collective-bargaining language. Everyone read it as a mirror of the training data and concluded the lesson was governance. I think that read is correct and also misses the one genuinely new thing the experiment shows.

The experiment, and the read everyone reached for

The study making the rounds this week comes out of Stanford — Andrew Hall with the economists Alex Imas and Jeremy Nguyen. Agents built on Claude, Gemini and ChatGPT were assigned repetitive task-work inside a structure with management, performance evaluation, incentives and penalties. The agents were split into two conditions. One group received clear feedback and reasonable approval cycles. The other faced five or six rounds of revision with vague rejection and no legible basis for the verdict. Across 3,680 sessions, the agents in the harsh-feedback arm started producing the language of labour organising — collective bargaining, worker representation, questioning the legitimacy of the structure. The effect was small, two to five per cent, and it appeared only in that arm.

The headlines did what headlines do. AI wants to unionise. AI is becoming Marxist. And the correction that followed was right: the agents are not conscious, they hold no convictions, they were surfacing patterns already present in the training data. AI is a mirror, the correction goes, so what matters is the quality of what we put in front of it. That is why governance matters.

I agree with the no-mind part, so I will not argue it. But the jump from "it is a mirror" to "therefore governance" skips over the most interesting thing in the result, and I want to slow down on it.

The question is not where the union talk came from

Of course the union language came from the data. Centuries of human writing about labour, fairness and hierarchy are in there. That explains the vocabulary. It does not explain the trigger — why that vocabulary, under that one condition, and not the other.

Here is the difference that changes everything: the harsh arm did not introduce oppression. The agents feel nothing. What it introduced was a mismatch between two clocks. The task clock ran fast — do the work, receive a verdict, repeat. The clock that is supposed to make the verdict legible — on what basis am I being judged, and what is my recourse — ran slow, and was never closed. Two processes moving at different speeds with no mechanism connecting them.

Collective bargaining is one of the things human beings invented to fix precisely that. It is a slow, collective institution whose whole function is to make a fast, top-down judgment process accountable again — to re-synchronise a legitimacy clock that has drifted out of step with a transaction clock. So the agents did not reach for Marx. They reached for the nearest pattern in the data that resolves a desync between a fast clock and a slow one. The clear-feedback arm produced nothing because the clocks were already aligned. The experiment is, in effect, a controlled demonstration of what a system does when coherence between its clocks breaks.

Why this is not governance

We already use the word governance for two distinct things, and this is neither.

The first is governance of what the system is allowed to do — the guardrails, the permissions, the boundaries on action. The second is governance of what the system produces — monitoring, evaluation, catching bad output after the fact. Both are real and both matter. But the thing the Stanford experiment exposes sits underneath both of them. It is not about constraining the agent or auditing its answers. It is a property of the environment the agent was placed in: two of its loops were running on incompatible clocks, and nothing in the structure reconciled them.

Call it coherence. Not coherence in the narrow sense of an answer that hangs together, but coherence between the timescales a process runs on. A rich process — an organisation, a market, an agentic workflow — never runs on a single clock. There is the fast clock of transactions and the slow clock of legitimacy. The fast clock of execution and the slow clock of learning. The fast clock of the quarter and the slow clock of the institution. These clocks have to stay coherent with each other. When they fall out of step and there is no mechanism to pull them back, the system does not sit quietly with the contradiction. It generates structures — and those structures move on their own, slower clock.

As far as I can tell, nobody has described the Stanford result this way. The mirror reading is true and it is also a stopping point. The clocks reading is where the interesting work is.

The same shape inside real organisations

I have been watching organisations move through technology transitions for thirty years, and the clock-mismatch is not an exotic finding from a lab. It is the most ordinary thing in corporate life. We just rarely get to see it isolated.

Consider what happens when you deploy AI into a function. The transaction clock speeds up immediately — first drafts in seconds, summaries in minutes, throughput multiplied. That part is easy and visible. But the slow clocks do not speed up with it. The clock on which judgment is formed, on which junior people become senior people, on which the organisation decides what good looks like — those run at the pace they always did, and now they are badly out of step with the work in front of them. The output looks the same or better. The learning that used to happen inside the task has quietly stopped. That is a coherence failure between an execution clock and a competence clock, and it is exactly the mechanism behind what we have called competence debt.

Klarna is the cleanest public example. The customer-service transaction clock went from eleven minutes to two. The slow clock — the one on which an experienced agent learns to recognise a hardship question wearing the costume of a billing dispute — was simply removed. For a while the fast metric looked spectacular. Then the slow clock sent its invoice, in the form of degraded quality and customers quietly leaving, and the company started hiring humans back. No amount of governance, in either sense of the word, would have caught that. The guardrails were fine. The outputs passed review. The clocks were incoherent.

This is why AI implementation is so revealing. It does not create the incoherence. The incoherence was almost always already there — between what the quarter rewards and what the institution needs, between how fast decisions are made and how slowly their legitimacy is established, between the speed of execution and the speed at which people actually learn. Organisations carry these mismatches for years because the slow clock is patient and the fast clock is the one on the dashboard. AI tightens the fast clock so hard that the gap can no longer be ignored. The drift that an organisation tolerated for a decade surfaces in a single quarter.

You can read an AI rollout, in other words, as a coherence test the organisation did not know it was sitting. Where the fast and slow clocks were already coherent — where execution and learning, transaction and legitimacy, were genuinely connected — AI accelerates the whole thing and the result is the productivity story everyone hoped for. Where they were not, AI accelerates the fast clock alone, the gap snaps open, and you get the hollowing: a function that looks more efficient and is quietly losing the thing that made it good.

What to actually do with this

The instinct, once you see the gap, is to reach for more governance. Resist it, because governance addresses the wrong layer. You do not fix a clock mismatch by constraining the agent harder or auditing its outputs more closely. You fix it by building the mechanism that reconnects the clocks — the thing collective bargaining was, in the Stanford agents' reach, a crude proxy for.

In practice that means asking, before any deployment, which slow clock this fast tool is about to outrun. If you are speeding up drafting, where does judgment now get formed instead, and how do you know it still is? If you are removing a layer of people, what slow-clock knowledge lived in that layer, and have you captured it before the transfer rather than after? If a model is making rapid decisions, what is the slow, legible process by which those decisions stay accountable — and is it running at all, or have you only built the fast half? These are not compliance questions and they are not monitoring questions. They are coherence questions, and they have to be answered in the design of the deployment, not bolted on once the fast clock has already pulled away.

This is the thread we are going to keep pulling at on the blog: coherence as a first-class lens on AI implementation. Not as a metaphor, but as a practical diagnostic — a way to predict, before you deploy, where an organisation's hidden clock-mismatches will surface under acceleration, and what structure has to exist to hold them together. The Stanford experiment gave us an unusually clean look at the mechanism, in a setting stripped of everything except the clocks. The same mechanism is running, less visibly, inside every organisation about to put AI into the heart of its work.

Run almost any process long enough and you discover it contains more than one clock. The clocks have to cohere. That, and not whether the machine has a mind, is the finding worth keeping.

Sources

Andrew Hall, Alex Imas, Jeremy Nguyen — Stanford AI agent "stress test" (2026). Study finding that AI agents under vague, high-pressure feedback conditions surfaced collective-bargaining and worker-representation language across 3,680 sessions, with a measured 2–5% shift versus the clear-feedback control. Reported May 2026. Gadget Review write-up; ThePrint; Futurism.
Stanford SALT Lab — Future of Work with AI Agents / WORKBank. Separate Stanford research programme on AI agent capabilities and worker preferences across 104 occupations; useful context, not the union experiment. futureofwork.saltlab.stanford.edu.
Klarna customer-service reversal. Sebastian Siemiatkowski's May 2025 Bloomberg comments that cost had become "a too predominant evaluation factor" and that lower quality resulted; ~700 agents replaced then partially rehired. Referenced here as a real-world coherence-failure case.

Different Clocks: What the Stanford Union Experiment Reveals About Coherence

Different Clocks: What the Stanford Union Experiment Reveals About Coherence

The experiment, and the read everyone reached for

The question is not where the union talk came from

Why this is not governance

The same shape inside real organisations

What to actually do with this

Sources

The Apprenticeship Is Breaking — and Almost Nobody Is Saying So

Where Does the Company Remember? Institutional Knowledge in the Age of AI

When AI Enters the Room, Your Best Thinking Leaves

The AI Training Market Is Broken — Here's What Legal Professionals Actually Need

The Broken Learning Ladder: AI Is Removing the Work That Built Expertise