Multi-Model Architecture: What Harvey's Routing Decision Actually Means Under the Hood

Harvey started on a single OpenAI stack. By 2025, it was routing queries across OpenAI, Anthropic, and Google models. Here is what that architectural shift involves, what it costs, and why it matters for anyone evaluating legal AI tools.

In November 2022, Harvey launched as an OpenAI Startup Fund investment — essentially a GPT wrapper for legal work. One model provider, one API, one dependency. That was a defensible engineering decision at the time. OpenAI was the only viable option for production-grade legal AI, the alternatives were months behind, and time to market mattered more than architectural flexibility.

By 2025, Harvey had integrated Anthropic's Claude and Google's Gemini alongside its existing OpenAI partnership. The system now routes queries to whichever model performs best for a specific legal task type.

That is not a minor product update. It is a fundamental change in system architecture. And the engineering trade-offs involved are worth understanding, because they apply to every organisation deploying AI for professional work.

How Multi-Model Routing Works

Let me open the hood on what "multi-model architecture" actually means in practice, because the marketing language obscures the engineering.

A single-model system is straightforward. User query goes in, one model processes it, output comes back. One API, one set of rate limits, one pricing model, one set of failure modes. Simple to build, simple to debug, simple to maintain.

A multi-model system adds a routing layer between the user and the models. That layer has to make a decision: which model gets this query? The routing logic can operate on several criteria:

Task classification. Categorise the query (reasoning, extraction, drafting, research) and send it to the model that benchmarks highest for that category.
Context length. If the input exceeds a model's context window, route to one with a larger window.
Cost optimisation. Simple queries go to cheaper models. Complex queries go to frontier models.
Latency requirements. Time-sensitive tasks route to faster models.
Cascading. Start with a cheaper model. If confidence is below threshold, escalate to a more capable one.

Harvey's own description: the change provides "more optionality in selecting the best models for particular legal tasks." For most users, "this change will only be felt in results." That last part is the key engineering goal — the routing should be invisible.

The Sensor Fusion Analogy

I built autonomous drones at AirDog before DJI dominated the consumer market. The drone had to track a moving human in three dimensions — GPS, accelerometer, video feed, altitude sensor, all feeding into one navigation decision. No single sensor was sufficient. GPS drifts. Accelerometers accumulate error. Video loses the subject in poor light. The system worked because we fused multiple imperfect signals into one reliable output.

Multi-model AI architecture is the same principle. Each model is a sensor with its own strengths and blind spots:

Model strength	Legal task fit	Weakness
Strong chain-of-thought reasoning	Statutory interpretation, case analysis, risk assessment	Slower, more expensive per token
Large context window, accurate extraction	Due diligence, contract data extraction, document classification	May produce wooden prose
Natural language fluency, tone matching	Client correspondence, memoranda drafting, legal writing	May introduce subtle analytical errors
Fast inference, lower cost	Document triage, simple classification, routine queries	Lower ceiling on complex reasoning

No single model excels across all four categories. Routing lets you match the right model to the right task — the same way sensor fusion matches the right data source to the right navigation decision.

Single-Vendor vs. Multi-Model: The Engineering Trade-Offs

This is not a case where multi-model is simply "better." There are real trade-offs. Here is the comparison:

Factor	Single-vendor	Multi-model
Implementation complexity	Low. One API, one SDK, one auth flow	High. Multiple APIs, normalised interfaces, routing logic
Latency	Predictable. One network hop	Variable. Routing adds decision time. Different models have different response speeds
Cost control	Simple. One pricing model	Complex but potentially lower overall. Route cheap tasks to cheap models
Output consistency	High. Same model, same style	Lower. Different models produce different tones, formats, reasoning styles
Vendor lock-in risk	High. If the provider raises prices or degrades quality, you are stuck	Low. Swap models without rebuilding the system
Debugging	Straightforward. One model to audit	Complex. Need to trace which model produced which output
Best-of-breed performance	Capped by one model's ceiling	Higher. Each task gets the strongest available model
Time to production	Faster. Less to build	Slower. Routing logic, testing across models, normalisation
Resilience	Single point of failure. If the provider has downtime, everything stops	Failover possible. Route around outages

The honest assessment: multi-model delivers better results at higher engineering cost. Whether the trade-off is worth it depends on task diversity and volume.

Pay Attention To: The Governance Problem

Here is the part that most multi-model discussions skip, and it is the part with the sharpest consequences for legal work.

When a single model produces an output, you know where it came from. When a routing system selects among three or four models, tracing provenance gets complicated.

Specific governance questions to ask any multi-model vendor:

Can you log which model produced each output?
Can you reproduce a result — same input, same model, same parameters?
If a model is updated by its provider, do you version-track which model version was active when a specific output was generated?
How do you handle inconsistent outputs when the same query routes to different models on different days?

For legal work, where accuracy is a professional obligation and errors carry liability, these are not theoretical concerns. If a due diligence report contains an error and you cannot trace which model generated it, your audit trail has a gap.

This is not a reason to avoid multi-model architectures. The performance benefits are real. It is a reason to verify that governance keeps pace with architecture.

Red Flags When Evaluating Legal AI Tools

Whether a vendor uses single-model or multi-model architecture, these signals should raise your attention:

"We use the latest AI." Which model? Which version? If the vendor cannot or will not specify, they are selling marketing, not engineering.
No model disclosure. You are entitled to know what processes your data. This is not a trade secret — it is a dependency you are inheriting.
Accuracy claims without methodology. "95% accuracy" means nothing without knowing: accuracy at what task, measured how, on what dataset, verified by whom. Stanford's 2024 study found Lexis+ AI hallucinated on 17% of queries and Westlaw AI-Assisted Research on 33%. Vendors claiming significantly better numbers need to show their working.
No trial on your data. Generic demos prove nothing. If a vendor will not let you test with your actual document types and workflows, they know the demo does not match the deployment.
Locked-in pricing with no model flexibility. If the vendor is locked to a single provider and that provider raises API costs by 40%, your bill goes up. Ask how pricing changes propagate.

The Cost Engineering Angle

Multi-model routing creates a cost optimisation opportunity that single-vendor setups cannot match. The numbers:

A simple document classification task might cost $0.002 per query on a lightweight model. The same query sent to a frontier reasoning model could cost $0.06 — thirty times more. For an organisation processing thousands of documents monthly, intelligent routing between a $0.002 classification model and a $0.06 reasoning model produces significant savings without sacrificing quality where it matters.

Harvey's architecture enables exactly this. Routine tasks go to efficient models. Complex legal reasoning gets the full capability of frontier models. The user does not need to know or care. They get the right result at the right cost.

The Market Direction

Harvey's shift is not isolated. It reflects three converging trends:

Model proliferation. OpenAI, Anthropic, Google, Meta, Mistral, and others continue releasing models with distinct capability profiles. The differences between providers are not converging — they are diversifying. Each is making different trade-offs between speed, accuracy, reasoning depth, context length, and cost.

Open-weight competition. Meta's Llama and Mistral models provide alternatives that can be fine-tuned for specific applications without commercial API pricing constraints. For organisations with engineering capacity, this creates options that did not exist twelve months ago.

Commodity pressure. As baseline AI capabilities become widely available, differentiation shifts from "can the model do legal work?" to "can the model do this specific type of legal work better than the alternatives?" That shift favours architectures that match the right model to the right task.

Practical Implications

For organisations evaluating legal AI tools right now:

Map your task portfolio first. List the specific legal tasks you want AI to support. Categorise them: reasoning, extraction, drafting, research. Different tasks may warrant different tools.
Ask vendors which models they use. If they will not tell you, that is your answer.
Test on your own data. Generic benchmarks measure generic performance. Your contract types, your document formats, your practice areas — that is the only benchmark that matters.
Evaluate vendor lock-in risk. Single-model tools are simpler but create dependency. If the underlying model provider changes pricing, restricts access, or falls behind competitors, your tool's capabilities change in ways you cannot control.
Build verification workflows that are model-agnostic. Your verification protocol should not depend on knowing which model produced the output. It should verify the output regardless of source. This is both more practical and more robust.
Plan for the architecture to change. Whatever you deploy today will need to evolve. Build with adaptability as a design requirement, not an afterthought.

The organisations that will perform best are not the ones that picked the best model in 2026. They are the ones that built systems capable of adapting as the models improve.

Sources

Harvey AI — Official site — Harvey's product and partnership announcements
Harvey AI Models blog post — Harvey's announcement of multi-model integration with Anthropic and Google
Stanford HAI — Hallucinating Law — Stanford's 2024 study on hallucination rates in legal AI tools (17% Lexis+ AI, 33% Westlaw AI-Assisted Research)
AirDog — Autonomous drone company co-founded by Edgars Rozentals
Meta Llama — Meta's open-weight large language model family
Mistral AI — European open-weight model provider

Edgars Rozentals is co-founder and CTO of Twin Ladder. He builds AI systems with the same engineering discipline he applied to autonomous drones at AirDog and IoT hardware at H2YO — if the vendor cannot show the data, the claim does not ship.

Multi-Model Architecture: What Harvey's Routing Decision Actually Means Under the Hood

Multi-Model Architecture: What Harvey's Routing Decision Actually Means Under the Hood

How Multi-Model Routing Works

The Sensor Fusion Analogy

Single-Vendor vs. Multi-Model: The Engineering Trade-Offs

Pay Attention To: The Governance Problem

Red Flags When Evaluating Legal AI Tools

The Cost Engineering Angle

The Market Direction

Practical Implications

Sources

Legal AI Valuations: When the Numbers Stop Making Conventional Sense

Harvey's $8B Valuation: What the Numbers Actually Mean

AI Adoption in Am Law 100: What 42% Harvey Penetration Tells Us

Multi-Model Architecture: Why Harvey Moved Beyond Single-Vendor AI