TWINLADDER
TwinLadder
TWINLADDER
Back to Newsletter

Issue #8

Harvey Adds Anthropic and Google Models: Technical and Commercial Analysis

Harvey's shift from OpenAI-exclusive to multi-model routing has implications for pricing, performance, and data governance. We examine the technical architecture and what it signals about vendor lock-in concerns.

Harvey
Multi-Model AI
Anthropic
Google
Architecture
May 23, 202516 min read
Harvey Adds Anthropic and Google Models: Technical and Commercial Analysis

Listen to this article

0:000:00

TwinLadder Weekly

Issue #8 | May 2025


Editor's Note

Harvey just did something that should make every lawyer reconsider how they think about AI tools. On May 13, they announced integration of models from Google and Anthropic alongside their existing OpenAI infrastructure. Harvey — one of OpenAI's most prominent early-backed companies — effectively told their primary investor's competitor: we need you too.

This is not a technology story. It is a market power story. When the legal AI platform with the deepest OpenAI relationship decides that no single model is good enough, it signals something practitioners need to understand: the era of "which AI should we use?" is over. The question is now "which AI should we use for this specific task?"

I discussed this with Edgars Rozentals, who tracks agentic AI systems and multi-model architectures. His observation was sharp: "The moment Harvey decoupled from a single model provider, they signalled that the intelligence layer is a commodity. The value is in the orchestration — knowing which engine to use for which legal task. Most firms are still debating whether to adopt AI at all. Harvey is already past that question entirely."

If Harvey, with all its resources, needs three different engines to serve its clients well, what does that tell us about the single-model tools the rest of us are using?


Why Harvey Went Multi-Model (And What It Means for Your Firm)

The BigLaw Bench Results

Harvey's BigLaw Bench testing exposed something the vendor marketing obscures: different models are good at different things, and the differences matter for legal work.

Gemini 2.5 Pro excels at drafting and can process over a million tokens of context — meaning entire transaction data rooms, not just individual documents. But it struggles with trial preparation and reasoning about complex evidentiary rules like hearsay exceptions. Claude 3.7 Sonnet and OpenAI's o1 handle complex reasoning and evidentiary analysis better but lack the context window for massive document review. The platform now routes tasks to whichever model performs best for that specific type of legal work.

Model Strength Limitation
Gemini 2.5 Pro Drafting, 1M+ token context windows Weaker on evidentiary reasoning
Claude 3.7 Sonnet Complex reasoning, nuanced analysis Smaller context window
OpenAI o1 Structured legal reasoning Less effective for massive document review
Harvey's routing layer Task-optimised model selection Opaque to end users

For practitioners, the implications are more significant than the technical details suggest.

Four Implications That Matter

First, model commoditisation is accelerating. If Harvey treats models as interchangeable components selected by task, the value is in the routing intelligence and the legal-specific training layer — not in which foundation model sits underneath. This means the "we use GPT-4" pitch from your current vendor is increasingly meaningless. Ask instead: how do you select the right model for the right task?

Second, single-vendor dependency is a liability. When OpenAI has an outage, single-model platforms go dark. Multi-model architecture provides failover. But it also introduces new problems. A mid-size litigation boutique reported that Harvey routed a complex evidentiary argument to Claude instead of GPT-4, and the reasoning depth improved dramatically. Good outcome. But three weeks later, the same type of prompt produced stylistically different analysis because a different model was selected. The partner said: "I cannot build muscle memory for what the tool produces. Every time feels like working with a different associate."

Third, the audit trail just got more complicated. When a client questions a bill for AI-assisted research, "the AI did it" was already inadequate. "Two different AIs did it and we cannot clearly explain which one did what" is worse. Multi-model capability is a compliance and explainability challenge that firms have not grappled with yet. For European firms subject to the EU AI Act's transparency obligations under Articles 13 and 14, this opacity is not merely inconvenient — it is a regulatory risk. [MODERATE CONFIDENCE]

And fourth, your prompt engineering may be worthless. One firm in Frankfurt invested heavily in prompt libraries optimised for GPT-4. When Harvey began routing tasks to Claude and Gemini, those carefully crafted prompts produced inconsistent results. The associate responsible described it as "basically starting over." Prompt engineering is not model-agnostic, and multi-model platforms may require multi-model prompt development.

The Valuation Question

Harvey's 80x revenue multiple — the $3 billion valuation on roughly $75 million ARR — makes more sense viewed through this lens. The bet is not on any particular AI model. It is on Harvey's ability to build the orchestration layer that sits between models and legal work. Whether that orchestration layer is worth 80x revenue is a venture capital question, not a legal technology question. I have my doubts. But the underlying capability is real.

Metric Harvey Market Context
Valuation $3B (May 2025) Highest legal AI valuation globally
Est. ARR ~$75M Revenue multiple: ~80x
Enterprise customers 500+ across 53 countries Weekly active users growing 4x YoY
Am Law 100 penetration 50+ firms ~50% of the top tier
Public SaaS norm 6-12x ARR multiple

Edgars Rozentals adds a technical note worth hearing: "Multi-model routing is the beginning of agentic architecture. Today Harvey picks the best model per task. Tomorrow it will chain models together — one to research, another to draft, a third to verify. The firms that understand this progression will be ready. The firms that think multi-model is just a feature upgrade will be surprised." [HIGH CONFIDENCE]


The Competence Question

A partner at a large European firm described her experience after Harvey's Gemini integration: "We loaded an entire data room — two thousand documents — and asked questions across the full corpus. What took a week compressed to two days." Impressive. But when I asked whether her associates understood the documents they had queried, she hesitated.

There is a difference between finding the answer in two thousand documents and understanding the transaction those documents represent. Context windows are getting larger. The ability to absorb and synthesise the meaning of what sits inside them remains a human capability. An associate who can prompt a million-token model to extract every change-of-control provision across a data room has accomplished a mechanical task. An associate who understands why that change-of-control provision matters in the context of the buyer's integration strategy, the target's key employee retention concerns, and the regulatory implications across three EU jurisdictions — that associate has exercised judgment.

When your tools get dramatically more powerful, the temptation is to let them do more. Multi-model routing means the AI can handle drafting, reasoning, and large-scale review with increasing sophistication. What it cannot do is tell your associate when the analysis is wrong in ways that matter commercially but not legally. The clause is technically compliant. The deal still will not work. That is the judgment gap, and larger context windows make it easier to miss.


What To Do

  1. Ask your AI vendors the model question. "Which models does this use?" and "How does it decide which model handles which task?" If they cannot answer clearly, they have not thought about it seriously.

  2. Test for consistency. Run the same prompt three times on different days. If you get meaningfully different outputs, understand that multi-model routing (or model updates) may be the cause. Build your workflows to accommodate variability, not assume uniformity.

  3. Strengthen your audit documentation. If your platform uses multiple models, your records should reflect which model produced which output. For malpractice and compliance purposes — and particularly for EU AI Act transparency requirements — "the AI wrote it" is not sufficient. You need to know which AI.

  4. Do not rebuild prompt libraries blindly. Before investing in model-specific prompt engineering, assess whether your platform even exposes model selection. If routing is automatic and opaque, optimising for a specific model is futile.

  5. Evaluate multi-model platforms against your actual practice. For firms with narrow, predictable workflows, single-model simplicity may outweigh routing benefits. For diverse practices handling everything from brief writing to due diligence, task-optimised routing delivers measurable improvement. Match the tool's architecture to your needs.


Quick Reads

  • Harvey's growth metrics: 500+ enterprise customers across 53 countries, weekly active users growing 4x year-over-year, 50+ Am Law 100 firms. The platform's market position is substantial regardless of how you view the valuation.

  • TechCrunch frames the announcement as an Anthropic and Google win over OpenAI — the competitive dynamics between foundation model providers are reshaping enterprise legal AI faster than most firms are tracking.

  • Both Anthropic and Google models are integrated through AWS Bedrock and Google Vertex with equivalent enterprise security guarantees — the Azure-only era for legal AI infrastructure is ending.

  • Sacra estimates Harvey's ARR at $75M+ — impressive growth but still representing an 80x revenue multiple at the current $3 billion valuation. Enterprise legal AI is a bet on the future, priced accordingly.


One Question

If the leading legal AI platform needs three different AI engines to serve its clients adequately, how confident should you be in the single-model tool your firm adopted last year?


TwinLadder Weekly | Issue #8 | May 2025

Helping European professionals build AI competence through honest education.

Included Workflow

Multi-Model Strategy Assessment

Framework for evaluating multi-model AI architecture for legal practice. Covers current state analysis, use case mapping, multi-model evaluation, and risk assessment.

Start this workflow