TWINLADDER
TwinLadder logoTwinLadder
Atpakaļ uz apskatu

Izdevums #33

When the Models Get Better, Does the Problem Go Away?

Sullivan & Cromwell — the firm that advises OpenAI on the safe deployment of AI — just filed a Chapter 15 emergency motion riddled with AI hallucinations. Liga reads the apology letter and asks: if better models still produce undetectable errors, what is actually broken? Four bets companies are making, one structural question, and why the simulator is still missing.

Sullivan & Cromwell
AI Hallucinations
Apprenticeship
Competence
Article 4
Legal AI
2026. gada 25. aprīlis17 min read
When the Models Get Better, Does the Problem Go Away?

TwinLadder Weekly

Issue #33 — When the Models Get Better, Does the Problem Go Away?

25 April 2026 · Bi-weekly intelligence on AI competence, governance, and the workforce


Editor's Note

From Alex —

We started this newsletter to track one specific question: where competence is forming, and where it is breaking, as AI moves into the work. Last week sharpened the question more than any single event has done so far.

Sullivan & Cromwell — the firm that advises OpenAI on the safe deployment of artificial intelligence — filed an emergency motion in a federal bankruptcy court that turned out to be full of AI hallucinations. Fabricated citations. Misquoted statutes. Cases that did not exist. The errors were caught not by the firm, but by opposing counsel.

The dominant industry response so far has been: this is a temporary problem. The models will get better. The hallucinations will go away. The profession will adjust.

I do not think this is right, and I want to spend this issue on why. Liga has the analysis. The short version of the answer is that it is not a model problem.

— Alex


The Sullivan & Cromwell Case

On 18 April, Andrew Dietderich, co-chair of Sullivan & Cromwell's restructuring practice, sent a three-page letter to Chief Judge Martin Glenn of the US Bankruptcy Court for the Southern District of New York. The letter apologised for "inaccurate citations and other errors" in an emergency motion filed nine days earlier in the Chapter 15 proceedings of Prince Global Holdings Ltd. — "some of which" were "artificial intelligence ('AI') 'hallucinations.'" Attached was a Schedule A running to multiple pages, replacing fabricated case citations, fixing misquoted Chapter 15 precedents, and removing references to authorities that did not appear to exist.

The case is In re Prince Global Holdings Limited and Paul Pretlove, Bankr. S.D.N.Y. Case No. 1:26-bk-10769. The errors were identified by opposing counsel at Boies Schiller Flexner.

Read the letter carefully. The apology says three things in sequence: the firm has comprehensive AI policies, the firm has mandatory training, the firm tells lawyers to "trust nothing and verify everything." None of that prevented the filing.

The official explanation is that the motion was an emergency, and the usual review cycle was compressed. This is true. It is also where the problem actually lives — and where the dominant industry narrative misses what is happening.


Why Better Models Make This Harder, Not Easier

The 2023 hallucination cases — Mata v. Avianca and the wave that followed — were obvious in retrospect. The fabricated cases did not sound like real cases. The reasoning fell apart on a second read. The lawyers who got sanctioned were not stupid; they were operating without the trained reflex to question output that sounded authoritative, and the output of that era was, with hindsight, easy to question.

The 2026 hallucinations are different. The Sullivan & Cromwell errors slipped past one of the most credentialed legal teams in the United States, supervised by partners with decades of restructuring experience, in a firm that consults the world's most prominent AI lab on safe deployment. The errors did not slip through because anyone was careless. They slipped through because the AI output now sounds exactly like what a senior partner would write — at the formatting level, at the reasoning level, at the citation-pattern level.

The problem getting solved is plausibility. The problem getting harder is detection.

This is the inverse of what most boards have been told. The narrative since 2023 has been: hallucinations are a teething issue, they will reduce as models improve, the profession will adapt. What is actually happening is that the rate of hallucinations may be falling — the published academic evidence on this is mixed — but the detectability of any given hallucination is also falling, because the output is more confident, more correctly structured, more semantically coherent. The error-per-page rate could halve and the missed-error-per-page rate could still rise, because the errors that remain are exactly the ones a competent reviewer would not catch.

Damien Charlotin's hallucination database now tracks more than 1,000 incidents in court proceedings worldwide, 90% of them sanctioned in 2025 alone, four to five new cases per day. The curve does not look like a problem solving itself.


Four Bets, One Question

The companies that depend most on apprenticeship are responding to the AI shift in four distinct ways, and the contrast between them is the clearest signal we have about how this question is actually being thought through.

The contraction bet

The Big Four are quietly shrinking the base of the pyramid. KPMG cut its UK graduate intake by roughly 29% between 2023 and 2024, from 1,399 places to 942. Deloitte cut 18%, EY 11%, PwC 6%. Across the sector, UK accountancy graduate adverts fell 44% against 2023. PwC's global chair has said explicitly that the firm wants "a different set of people" — more engineers, fewer generalist analysts.

The implicit theory is that AI substitutes for entry-level work. Fewer juniors needed. The pyramid becomes an obelisk. The bet is that the senior pipeline is a problem for 2030, not for now.

The reconfiguration bet

The big law firms are doing the opposite, and doing it with their balance sheet. Latham & Watkins flew all 400 of its US first-years to Washington for a two-day AI Academy and has repeated the programme with every cohort since. Ropes & Gray went further: first-year associates may now dedicate up to 400 non-billable hours per year — roughly a fifth of their billable target — to AI training, tool experimentation, and mentoring circles across 15 approved tools.

Sit with that number. A first-year associate at a top US firm bills near $700–$900 per hour. 400 hours is between $280,000 and $360,000 of forgone revenue per associate per year. That is not a training budget. It is a recognition that the apprenticeship has broken — that the tuition juniors used to absorb as a by-product of billable work is no longer arriving for free, and the firm now has to pay for it explicitly.

A&O Shearman, the first law firm to deploy generative AI firm-wide, has gone further still: trainees are actively encouraged to trial agentic legal AI on live matters under supervision. The bet is that AI deployment is so risky that juniors need more structured exposure, not less, and the only way to build the trained reflex is sustained, accountable contact with the tools under conditions that approach real practice.

The contrast with the Big Four is total. One side treats AI as a substitute for juniors. The other treats AI as something so dangerous to deploy at scale that the only sustainable response is to pay for the apprenticeship that the technology has displaced.

The expansion bet

One firm at scale is betting the opposite of the Big Four. In February 2026, IBM Chief HR Officer Nickle LaMoreaux said: "We are tripling our entry-level hiring, and yes, that is for software developers and all these jobs we're being told AI can do." This followed CEO Arvind Krishna's October 2025 commitment to hire more college graduates over the next twelve months than IBM had hired in the past few years.

The wager is specific and worth naming clearly. If the rest of the industry thins its pipeline, there will be a senior-talent shortage in the mid-2030s, and the firms that preserved their juniors will own that decade. IBM may be wrong. It may be carrying excess cost into a leaner industry. Or it may be making the most contrarian-correct enterprise hiring decision of the period.

What matters is that IBM is the only major firm at scale to state a pipeline-preservation strategy in public.

The silence

JPMorgan's LLM Suite onboarded 200,000 users within eight months of its 2024 launch; bankers now generate pitch decks in 30 seconds that "previously took a junior analyst hours." Goldman Sachs rolled out its AI Assistant firm-wide in mid-2025; by July, 46,000 staff were issuing over a million prompts a month.

Both firms have deployed, at scale, the exact tool that eliminates the first-draft work juniors used to learn from. Neither has cut headcount; the task is what is gone, not the body. Neither firm has publicly addressed what replaces the apprenticeship.

Of the four bets — contraction, reconfiguration, expansion, silence — only one of them, reconfiguration, is also a strategy. The other three are decisions made and not yet examined.


Why Sullivan & Cromwell Sits at the Centre of This

Place the Sullivan & Cromwell case against this map of bets and a specific picture forms.

S&C is not a firm that ignores the problem. By every public signal, S&C is doing something close to the reconfiguration bet — comprehensive AI policies, mandatory training, "trust nothing and verify everything" as a stated rule, an active advisory relationship with OpenAI on the topic of safe deployment. By any reasonable measure, this is one of the firms taking the question seriously.

And the hallucinations still reached the docket.

The lesson is not that policies fail. The lesson is that training is not competence. A two-day AI Academy is a literacy event. It teaches what the tool does. It does not, by itself, build the trained reflex that catches an AI-generated citation that does not feel quite right at 2 a.m. with a 9 a.m. deadline. That reflex only forms through repeated exposure to judgement-under-pressure scenarios, calibrated against ground truth, sustained over months and years.

Aviation figured this out fifty years ago. Every commercial pilot spends recurring time in a simulator — not as a graduation event, as a recurring discipline. The simulator preserves, in testable form, the institutional memory of the profession: every emergency, every failure mode, every edge case anyone has ever encountered. Pilots are calibrated against that memory continuously. When a new failure mode is discovered, the simulator updates, and every pilot encounters the new scenario in their next check ride.

Knowledge work has built two of the three layers aviation built. Governance is in place — the EU AI Act is binding, sectoral rules are coming. Infrastructure is in place — model registries, vendor frameworks, audit trails. The third layer, the equivalent of the simulator — the continuous maintenance of individual competence against a living memory of the role — is almost entirely missing.

Ropes & Gray's 400 hours is the closest thing to that third layer that we have seen any major firm install. It is not the simulator yet. It is recognition that the simulator is what is missing.


What This Means for Boards Right Now

Three things follow from the case, and from the wider pattern.

One. A written AI policy is necessary but not sufficient. Sullivan & Cromwell had a policy. It did not prevent the hallucinations. The competence layer underneath the policy is what determines whether the policy holds under pressure, and competence requires assessment, training, and measurable evidence — not just a document on the intranet.

Two. Compressed timelines are the highest-risk environment for AI use, not the lowest. Most firms allow more AI assistance under time pressure, on the implicit theory that AI saves time. The S&C case shows that compressed timelines are precisely when AI hallucinations are most likely to reach the final document, because the verification step is the first thing that gets cut. AI use under time pressure should be governed more tightly, not less.

Three. Better models do not solve this. They make it harder. A board that is waiting for the next model release to fix the hallucination problem is making a category error — the problem is not that the AI is wrong too often, the problem is that nobody is calibrating the human reviewer to catch the cases when the AI is wrong in newly subtle ways.


Reading List


What We Are Watching Next

  • Whether any other Big Four firm publicly reverses its graduate-intake contraction
  • Whether any major investment bank publicly addresses the apprenticeship question raised by million-prompt-per-month deployments
  • Whether Sullivan & Cromwell — or another elite firm caught by the same kind of hallucination — publishes a substantive review of how their training and protocols actually translate into trained competence at the associate level
  • Whether the EU AI Office issues any guidance on Article 4 implementation that addresses competence-vs-literacy distinctions specifically

The next issue will go deeper into one of these. If you want a specific function or sector covered, reply to this email.

— Liga


TwinLadder Weekly is a bi-weekly intelligence report on AI competence, governance, and the workforce. Subscribe at twinladder.ai/newsletter. Forward this issue freely.