Hoffmann et al. (2024) — Generative AI and the Nature of Work

TwinLadder Research Brief · Source Summary · May 2026

Companion reference to The Authority Gap.

Why this paper matters

Most claims about how AI is changing knowledge work rest on small surveys, vendor-released productivity numbers, or task-level lab experiments. Manuel Hoffmann and his Harvard Business School co-authors did something different. They obtained a panel of 187,489 active GitHub developers and tracked them weekly across roughly a year. The result is the largest, most carefully controlled empirical record of what happens to software work when AI is dropped into it.

The headline numbers are widely cited and widely simplified. This brief sets out what the paper actually found, what it did not find, and what its findings can and cannot be made to support in arguments about organisational governance.

What the study did

Hoffmann, Sameer Khan, Frank Nagle, Yi Hao, and Sida Peng (HBS Working Paper 25-021, drafted 2023, revised 2024) used GitHub's internal data to identify a panel of developers who had access to GitHub Copilot. The data covered weekly observations of:

Time on coding activities (writing and committing code).
Time on project-management activities (issue triage, pull-request review, repository administration).
Activity composition — independent versus collaborative work, exploration of new repositories versus exploitation of existing ones.

The panel includes 187,489 distinct developers. The estimation strategy uses access to Copilot as the variation of interest, with controls for developer characteristics, repository activity, and time effects.

The scale and the design matter for two reasons. First, this is administrative data — the platform records activity directly, rather than asking developers to self-report. Second, the panel is large enough to produce statistically defensible coefficients on activity composition rather than on aggregate productivity, which is rare in empirical AI-impact studies.

What the data show

Three findings carry the paper.

1. Coding time rises by 12.4%. Developers with Copilot access spend a larger share of their working time on core coding activities. The increase is statistically significant and consistent across firm size and developer experience.

2. Project-management time falls by 24.9%. Developers with Copilot access spend substantially less time on issue management, pull-request review, and repository coordination. This is the more surprising finding and the one that has been most cited — and most simplified.

3. The activity shift is concentrated among less-experienced developers. Junior developers see the largest gains in coding time and the largest drops in project-management time. The effect is not uniform across the experience distribution.

Two underlying mechanisms drive the shift, according to the authors:

Independent work increases. Developers with Copilot rely less on collaboration with other developers for code-level questions; the model substitutes for some peer help.
Exploration increases. Developers with Copilot are more likely to start work on new repositories rather than continue existing ones — a measurable shift toward exploring new problems rather than exploiting familiar ones.

What the data do not show

This is the part most relevant to the Authority Gap argument and most often misread.

The study measures time allocation, not who decided what. The 24.9% reduction in project-management activity tells us that developers with Copilot spend less time on tasks coded as project management. It does not tell us — and was not designed to tell us — that decision rights migrated, that authority was reallocated, or that middle managers' coordination work has been absorbed at the front line. Those interpretations may be consistent with the data, but they are not established by it. They require additional argument and additional evidence.

This matters because the Hoffmann et al. paper has been cited (including in early drafts of the Authority Gap research piece itself) as if it directly demonstrates the migration of coordination decisions. It does not. The paper demonstrates the time-allocation prerequisite for that migration: less developer time is being spent on traditional coordination activity. Whether the coordination work has disappeared, been absorbed by managers, been redefined and relocated to the front line, or been replaced by a new category of coordination — coordinating the model itself — is a question the data are silent on.

The careful reading, which the Authority Gap piece develops, is that coordination work has been redefined and re-localised, not absorbed: the work that used to mean coordinating people increasingly means coordinating an AI agent's contribution to one's own task, and that work necessarily sits next to the production task rather than above it.

The authors' explicit warning about junior hiring

The single most pointed sentence in the paper is the authors' assessment of the strategy of cutting junior hiring on the assumption that AI fills the gap. They call this a "profound strategic error." Their reasoning is empirical: the largest gains from Copilot accrue to less-experienced developers. The complementarity is what accelerates skill development. Firms that respond to Copilot by reducing graduate intake are removing precisely the cohort whose productivity Copilot most amplifies, and they are eroding the pipeline through which mid-career and senior expertise is produced.

This conclusion connects directly to the Brynjolfsson, Chandar, and Chen finding (separately summarised in this series) that early-career workers in AI-exposed occupations are showing measurable employment declines in payroll data. The two papers, read together, document a pattern: AI is being deployed in ways that disproportionately benefit junior workers, while organisations are simultaneously hiring fewer of them. The structural mismatch is not subtle.

What the paper supports — and what it does not

The Authority Gap research piece cites Hoffmann et al. carefully. The careful citation is:

A 24.9% reduction in developer project-management activity is what you would expect when the coordination work itself has been redefined — from coordinating people on a roadmap to coordinating an AI agent's contribution to the developer's own task — and re-localised to the role doing the production work. The study itself measures time allocation, not decision rights, so the inference is directional rather than established.

The careless citation, which the broader productivity literature has begun to make, is:

"AI is absorbing middle-manager coordination work into the front line."

The first formulation is honest about what the paper does and does not establish. The second imports an interpretive layer the data cannot bear. Distinguishing the two is the difference between a citation that survives a sceptical reader's attention and one that doesn't.

How this brief connects to the Authority Gap

The Authority Gap framework's "aggregation authority" — across roles, the third of four moments — is the question of who has formal authority over the in-flight decisions an AI tool now lets a frontline role make in real time. The Hoffmann et al. data are the closest large-scale empirical evidence for the time-allocation prerequisite of that question being answered. The paper does not by itself answer the authority question. It establishes that the coordination workload at the front line has shifted enough to make the authority question worth asking.

Citation

Hoffmann, M., Khan, S., Nagle, F., Hao, Y., & Peng, S. (2024). Generative AI and the Nature of Work. Harvard Business School Working Paper 25-021. hbs.edu/ris/download.aspx?name=25-021.pdf

TwinLadder Research Briefs are short reference summaries of the foundational sources cited in our research pieces. They are not commentary; they are background reading. Companion to the Authority Gap launch series, May 2026.