Multi-Model Architecture: What Harvey's Routing Decision Actually Means Under the Hood
Harvey started on a single OpenAI stack. By 2025, it was routing queries across OpenAI, Anthropic, and Google models. Here is what that architectural shift involves, what it costs, and why it matters for anyone evaluating legal AI tools.
In November 2022, Harvey launched as an OpenAI Startup Fund investment — essentially a GPT wrapper for legal work. One model provider, one API, one dependency. That was a defensible engineering decision at the time. OpenAI was the only viable option for production-grade legal AI, the alternatives were months behind, and time to market mattered more than architectural flexibility.
By 2025, Harvey had integrated Anthropic's Claude and Google's Gemini alongside its existing OpenAI partnership. The system now routes queries to whichever model performs best for a specific legal task type.
That is not a minor product update. It is a fundamental change in system architecture. And the engineering trade-offs involved are worth understanding, because they apply to every organisation deploying AI for professional work.
How Multi-Model Routing Works
Let me open the hood on what "multi-model architecture" actually means in practice, because the marketing language obscures the engineering.
A single-model system is straightforward. User query goes in, one model processes it, output comes back. One API, one set of rate limits, one pricing model, one set of failure modes. Simple to build, simple to debug, simple to maintain.
A multi-model system adds a routing layer between the user and the models. That layer has to make a decision: which model gets this query? The routing logic can operate on several criteria:
- Task classification. Categorise the query (reasoning, extraction, drafting, research) and send it to the model that benchmarks highest for that category.
- Context length. If the input exceeds a model's context window, route to one with a larger window.
- Cost optimisation. Simple queries go to cheaper models. Complex queries go to frontier models.
- Latency requirements. Time-sensitive tasks route to faster models.
- Cascading. Start with a cheaper model. If confidence is below threshold, escalate to a more capable one.
Harvey's own description: the change provides "more optionality in selecting the best models for particular legal tasks." For most users, "this change will only be felt in results." That last part is the key engineering goal — the routing should be invisible.
The Sensor Fusion Analogy
I built autonomous drones at AirDog before DJI dominated the consumer market. The drone had to track a moving human in three dimensions — GPS, accelerometer, video feed, altitude sensor, all feeding into one navigation decision. No single sensor was sufficient. GPS drifts. Accelerometers accumulate error. Video loses the subject in poor light. The system worked because we fused multiple imperfect signals into one reliable output.
Multi-model AI architecture is the same principle. Each model is a sensor with its own strengths and blind spots:
| Model strength | Legal task fit | Weakness |
|---|---|---|
| Strong chain-of-thought reasoning | Statutory interpretation, case analysis, risk assessment | Slower, more expensive per token |
| Large context window, accurate extraction | Due diligence, contract data extraction, document classification | May produce wooden prose |
| Natural language fluency, tone matching | Client correspondence, memoranda drafting, legal writing | May introduce subtle analytical errors |
| Fast inference, lower cost | Document triage, simple classification, routine queries | Lower ceiling on complex reasoning |
No single model excels across all four categories. Routing lets you match the right model to the right task — the same way sensor fusion matches the right data source to the right navigation decision.
Single-Vendor vs. Multi-Model: The Engineering Trade-Offs
This is not a case where multi-model is simply "better." There are real trade-offs. Here is the comparison:
| Factor | Single-vendor | Multi-model |
|---|---|---|
| Implementation complexity | Low. One API, one SDK, one auth flow | High. Multiple APIs, normalised interfaces, routing logic |
| Latency | Predictable. One network hop | Variable. Routing adds decision time. Different models have different response speeds |
| Cost control | Simple. One pricing model | Complex but potentially lower overall. Route cheap tasks to cheap models |
| Output consistency | High. Same model, same style | Lower. Different models produce different tones, formats, reasoning styles |
| Vendor lock-in risk | High. If the provider raises prices or degrades quality, you are stuck | Low. Swap models without rebuilding the system |
| Debugging | Straightforward. One model to audit | Complex. Need to trace which model produced which output |
| Best-of-breed performance | Capped by one model's ceiling | Higher. Each task gets the strongest available model |
| Time to production | Faster. Less to build | Slower. Routing logic, testing across models, normalisation |
| Resilience | Single point of failure. If the provider has downtime, everything stops | Failover possible. Route around outages |
The honest assessment: multi-model delivers better results at higher engineering cost. Whether the trade-off is worth it depends on task diversity and volume.
Pay Attention To: The Governance Problem
Here is the part that most multi-model discussions skip, and it is the part with the sharpest consequences for legal work.
When a single model produces an output, you know where it came from. When a routing system selects among three or four models, tracing provenance gets complicated.
Specific governance questions to ask any multi-model vendor:
- Can you log which model produced each output?
- Can you reproduce a result — same input, same model, same parameters?
- If a model is updated by its provider, do you version-track which model version was active when a specific output was generated?
- How do you handle inconsistent outputs when the same query routes to different models on different days?
For legal work, where accuracy is a professional obligation and errors carry liability, these are not theoretical concerns. If a due diligence report contains an error and you cannot trace which model generated it, your audit trail has a gap.
This is not a reason to avoid multi-model architectures. The performance benefits are real. It is a reason to verify that governance keeps pace with architecture.
Red Flags When Evaluating Legal AI Tools
Whether a vendor uses single-model or multi-model architecture, these signals should raise your attention:
- "We use the latest AI." Which model? Which version? If the vendor cannot or will not specify, they are selling marketing, not engineering.
- No model disclosure. You are entitled to know what processes your data. This is not a trade secret — it is a dependency you are inheriting.
- Accuracy claims without methodology. "95% accuracy" means nothing without knowing: accuracy at what task, measured how, on what dataset, verified by whom. Stanford's 2024 study found Lexis+ AI hallucinated on 17% of queries and Westlaw AI-Assisted Research on 33%. Vendors claiming significantly better numbers need to show their working.
- No trial on your data. Generic demos prove nothing. If a vendor will not let you test with your actual document types and workflows, they know the demo does not match the deployment.
- Locked-in pricing with no model flexibility. If the vendor is locked to a single provider and that provider raises API costs by 40%, your bill goes up. Ask how pricing changes propagate.
The Cost Engineering Angle
Multi-model routing creates a cost optimisation opportunity that single-vendor setups cannot match. The numbers:
A simple document classification task might cost $0.002 per query on a lightweight model. The same query sent to a frontier reasoning model could cost $0.06 — thirty times more. For an organisation processing thousands of documents monthly, intelligent routing between a $0.002 classification model and a $0.06 reasoning model produces significant savings without sacrificing quality where it matters.
Harvey's architecture enables exactly this. Routine tasks go to efficient models. Complex legal reasoning gets the full capability of frontier models. The user does not need to know or care. They get the right result at the right cost.
The Market Direction
Harvey's shift is not isolated. It reflects three converging trends:
Model proliferation. OpenAI, Anthropic, Google, Meta, Mistral, and others continue releasing models with distinct capability profiles. The differences between providers are not converging — they are diversifying. Each is making different trade-offs between speed, accuracy, reasoning depth, context length, and cost.
Open-weight competition. Meta's Llama and Mistral models provide alternatives that can be fine-tuned for specific applications without commercial API pricing constraints. For organisations with engineering capacity, this creates options that did not exist twelve months ago.
Commodity pressure. As baseline AI capabilities become widely available, differentiation shifts from "can the model do legal work?" to "can the model do this specific type of legal work better than the alternatives?" That shift favours architectures that match the right model to the right task.
Practical Implications
For organisations evaluating legal AI tools right now:
- Map your task portfolio first. List the specific legal tasks you want AI to support. Categorise them: reasoning, extraction, drafting, research. Different tasks may warrant different tools.
- Ask vendors which models they use. If they will not tell you, that is your answer.
- Test on your own data. Generic benchmarks measure generic performance. Your contract types, your document formats, your practice areas — that is the only benchmark that matters.
- Evaluate vendor lock-in risk. Single-model tools are simpler but create dependency. If the underlying model provider changes pricing, restricts access, or falls behind competitors, your tool's capabilities change in ways you cannot control.
- Build verification workflows that are model-agnostic. Your verification protocol should not depend on knowing which model produced the output. It should verify the output regardless of source. This is both more practical and more robust.
- Plan for the architecture to change. Whatever you deploy today will need to evolve. Build with adaptability as a design requirement, not an afterthought.
The organisations that will perform best are not the ones that picked the best model in 2026. They are the ones that built systems capable of adapting as the models improve.
Sources
- Harvey AI — Official site — Harvey's product and partnership announcements
- Harvey AI Models blog post — Harvey's announcement of multi-model integration with Anthropic and Google
- Stanford HAI — Hallucinating Law — Stanford's 2024 study on hallucination rates in legal AI tools (17% Lexis+ AI, 33% Westlaw AI-Assisted Research)
- AirDog — Autonomous drone company co-founded by Edgars Rozentals
- Meta Llama — Meta's open-weight large language model family
- Mistral AI — European open-weight model provider
Edgars Rozentals is co-founder and CTO of Twin Ladder. He builds AI systems with the same engineering discipline he applied to autonomous drones at AirDog and IoT hardware at H2YO — if the vendor cannot show the data, the claim does not ship.

