TwinLadder logoTwinLadder
TwinLadder
TwinLadder logoTwinLadder
Back to Insights

Implementation Guides

How to Evaluate Your AI Vendor in 30 Minutes (And Why It's Not Enough)

We are giving away the prompts, the templates, and the methodology. Use them. Then discover why the hard part was never asking the right questions.

March 13, 2026Alex Blumentals, Founder & CEO10 min read
How to Evaluate Your AI Vendor in 30 Minutes (And Why It's Not Enough)

Listen to this article

0:000:00

How to Evaluate Your AI Vendor in 30 Minutes (And Why It's Not Enough)

By Alex Blumentals -- Twin Ladder


I have spent years inside procurement organisations watching the same knowledge asymmetry play out. The seller is focused on one product, one category. They craft sophisticated narratives, create apparent diversity, and know their data perfectly. The buyer has to purchase thousands of things across hundreds of categories and does not have the specialisation to see through all the sales material.

AI vendor evaluation is this asymmetry at its most dangerous. Your vendor has a dedicated compliance team producing beautiful transparency reports. You have a procurement analyst, a DPO, and a legal counsel -- none of whom have evaluated an AI system against the EU AI Act before. Article 26 requires you to verify your vendor's compliance. Article 27 may require you to produce a Fundamental Rights Impact Assessment. The deadline is August 2, 2026.

So here is what we are going to do. We are going to give you everything you need to do this yourself.


The prompts

Below are six prompts you can use with any large language model -- Claude, GPT, Gemini, Llama, whichever you prefer. Upload your vendor's documentation (transparency report, model card, terms of service, DPA) and run these prompts. Each one maps to a pillar of the Twin Ladder Standard and an Article 26 deployer obligation.

Prompt 1: Awareness and Transparency

"I am a deployer of this AI system under the EU AI Act. Analyse this vendor's documentation and answer: (1) Does the vendor clearly explain what the AI system does, what data it uses, and what outputs it produces? (2) Does the documentation explain which EU AI Act risk classification applies and why? (3) Does it explain what information and documentation the vendor provides to help me fulfil my Article 26 deployer obligations? Flag any vague language, missing information, or claims that are not supported by evidence."

Prompt 2: Data Protection and Policy

"Analyse this vendor's data processing documentation. Answer: (1) What personal data does the system process, for what purposes, and under what legal basis? (2) Does the vendor use customer data for model training or fine-tuning -- and can I opt out? (3) Where is data stored and processed, and are there cross-border transfers? (4) What are the data retention and deletion policies? (5) Has the vendor conducted a DPIA for this system? If not, why? Flag any gaps against GDPR Articles 13-14 (transparency) and Articles 35-36 (DPIA)."

Prompt 3: Training and Competence Support

"Analyse what training and support this vendor provides. Answer: (1) What training materials exist for staff who will use or oversee this system? (2) Is there role-specific training for different stakeholders (end users, administrators, compliance officers)? (3) Does the training cover system limitations and failure modes -- not just features? (4) How does the vendor assess whether deployer staff are competent to use the system? Flag any training that looks like product tutorials rather than competence-building."

Prompt 4: Tools and Human Oversight

"Analyse the human oversight capabilities documented for this AI system. Answer: (1) What tools does the vendor provide for monitoring AI decisions and outputs? (2) What logging and audit trail capabilities exist? (3) Can the system be paused, overridden, or shut down by the deployer? (4) What bias detection and fairness monitoring tools are included? (5) What performance monitoring tracks accuracy, reliability, and drift over time? Compare these against Article 14 (human oversight) requirements."

Prompt 5: Evidence and Documentation

"Analyse the evidentiary documentation this vendor provides. Answer: (1) Is there a model card or equivalent technical documentation covering training data, architecture, and known limitations? (2) What certifications, standards, or third-party audits has the system undergone? (3) What testing and validation evidence exists for accuracy, fairness, and robustness? (4) What incident response procedures are documented? (5) What version control and change management processes exist for model updates? Rate the documentation against Article 11 (technical documentation) and Article 13 (transparency) requirements."

Prompt 6: Governance and Accountability

"Analyse this vendor's governance framework. Answer: (1) Who has ultimate accountability for AI system safety? (2) What risk management processes exist for the full AI lifecycle? (3) Are there ethical review processes, red lines, or use-case restrictions? (4) What contractual commitments exist around performance, accuracy, and compliance -- any SLAs or warranties? (5) How does the vendor handle audit requests and due diligence assessments? (6) What is the approach to liability and indemnification for AI outputs? Flag any governance gaps against Article 9 (risk management) requirements."


The scoring

After running these prompts, score each pillar on a 0-3 scale:

  • 0 -- No evidence. The vendor provides nothing addressing this area.
  • 1 -- Partial. Some information exists but is vague, incomplete, or not specific to the AI system.
  • 2 -- Adequate. Clear, specific documentation that addresses the core requirements.
  • 3 -- Comprehensive. Detailed, evidence-backed documentation with proactive commitments.

Calculate the percentage: total score out of 18, multiplied by 100. Map to the Twin Ladder Standard levels:

Score Level What it means
0--25% Exploring Significant compliance risk. Do not proceed without vendor remediation plan.
26--50% Developing Material gaps. Acceptable only for minimal-risk deployments with additional controls.
51--75% Implementing Good compliance posture. Deploy with documented oversight measures.
76--100% Optimising Strong compliance posture. Evidence of proactive governance.

The templates

For a more structured evaluation, we provide three complete RFI templates -- 36 to 48 questions each, with scoring criteria and guidance for every question:

  1. General AI Vendor RFI -- 36 questions across 6 pillars. Works for any AI vendor.
  2. High-Risk AI Vendor RFI -- 48 questions, including 12 specific to Annex III high-risk systems and FRIA preparation.
  3. HR AI Tool RFI -- 48 questions, including 12 specific to employment and recruitment AI under Annex III Category 4.

Every question is mapped to a specific Article 26 deployer obligation and scored against the Twin Ladder Standard pillars. Every question includes guidance on what good, adequate, and poor answers look like.

These templates are free. Register for a Twin Ladder account and download them immediately. No trial period. No upsell conversation. They are yours.


Now here is why it is not enough

I would be dishonest if I stopped here. The prompts work. The templates are thorough. If you run them diligently, you will know more about your AI vendor's compliance posture than 90% of European mid-market companies know today.

But there are five things that no prompt, no template, and no LLM can give you -- regardless of how good the model is.

1. Context-specific evaluation

An LLM analyses the document you give it. It does not know your organisation. It does not know that you are deploying this HR screening tool in Latvia, where the intersection of EU AI Act Article 26 and Latvian employment law creates specific obligations that differ from deploying the same tool in Germany or France. It does not know that your HR team has three people and no dedicated AI oversight role -- which changes what "adequate human oversight" means in practice.

The same vendor documentation produces a different compliance verdict depending on who is deploying, where, for what use case, and with what internal capabilities. An LLM treats the document as context-complete. It is not.

2. Competence calibration

Your organisation's Twin Ladder maturity score determines what oversight measures you actually need. A Level 1 team (Exploring) deploying a high-risk AI tool needs fundamentally different controls than a Level 3 team (Implementing) deploying the same tool. The vendor's documentation might be adequate for one and insufficient for the other.

This is the Article 4 principle at work. AI literacy is not a checkbox -- it is a capability that shapes what "appropriate use" means. An LLM cannot calibrate vendor requirements against your people's actual competence level because it does not know your people.

3. Comparative benchmarks

When we tell you that a vendor scored in the 35th percentile for training documentation, that number means something. It means we have evaluated enough vendors to build a distribution. It means we know what good looks like -- not in theory, but from data across hundreds of evaluations.

An LLM has no benchmark. It can tell you the documentation "appears comprehensive" or "has gaps." It cannot tell you whether those gaps are typical for the industry or catastrophically below standard. Without comparative data, every evaluation is an island.

4. Legal defensibility

If a regulator asks you to demonstrate due diligence under Article 26, what will you show them? A ChatGPT conversation? A Claude transcript?

A Twin Ladder Vendor Compliance Report is a structured document mapped to specific Article 26(1) deployer obligations, scored against an open, auditable methodology (the Twin Ladder Standard, CC BY-SA 4.0), and -- for Pro tier -- reviewed and signed off by a qualified assessor. It is designed to be evidence. An LLM conversation is not.

5. The FRIA bridge

If your AI system is high-risk under Annex III, Article 27 requires a Fundamental Rights Impact Assessment before deployment. This is not optional for public bodies, public service providers, or organisations using AI for credit scoring or insurance.

A FRIA is not a document you can generate from a prompt. It requires identifying affected persons, mapping Charter of Fundamental Rights implications, conducting a structured risk analysis with likelihood and severity scoring, planning mitigation measures, and documenting oversight mechanisms. The vendor evaluation data feeds directly into the FRIA -- but only if the evaluation was structured to capture the right inputs.

Our FRIA workflow takes the vendor evaluation data you already produced and pre-populates 30--40% of the FRIA requirements. No prompt can do this because the connection between vendor evaluation and FRIA is architectural, not textual.


The honest proposition

Here is what we are offering, and why.

Free tier (registered users):

  • Six prompts (above) -- use with any LLM
  • Three RFI templates (36--48 questions each) with scoring criteria
  • FRIA eligibility checker -- six questions to determine if you need a Fundamental Rights Impact Assessment

We give these away because the methodology should be accessible. The Twin Ladder Standard is open (CC BY-SA 4.0). We believe the framework should be a public good. If you can do this yourself, do it.

Paid tier (when you need more):

  • Article 26 Compliance Report (from €490) -- we evaluate the vendor, produce the report, map to your specific obligations
  • Article 27 Impact Assessment (from €1,500) -- guided 7-step FRIA workflow, pre-populated from evaluation data
  • Deployer Compliance Kit (from €2,500) -- both in one integrated workflow
  • Deployer Compliance Kit Pro (from €5,000) -- with expert assessor review and legal sign-off

The paid tier exists because context, calibration, benchmarks, legal defensibility, and FRIA integration are things that require a platform, not a prompt. We are not hiding the methodology. We are offering to apply it -- with your organisation's specific context, against our accumulated benchmark data, producing documents that stand up to regulatory scrutiny.


What to do right now

  1. Take the free FRIA eligibility check. Two minutes. Know whether Article 27 applies to you.
  2. Register and download the RFI templates. Run them against your most critical AI vendor.
  3. Use the prompts above on your vendor's documentation. Score it. See where the gaps are.
  4. If you need more -- contextual evaluation, comparative benchmarks, legal evidence, FRIA documentation -- start a vendor evaluation.

The deadline is August 2, 2026. Your vendors have documentation. The question is not whether you can read it. The question is whether you can evaluate it -- for your people, your workflows, your regulatory position.

That is the gap an LLM cannot close.


Sources

  1. Regulation (EU) 2024/1689, Article 26 -- Obligations of deployers of high-risk AI systems, including verification of provider compliance, human oversight, and documentation requirements. EUR-Lex

  2. Regulation (EU) 2024/1689, Article 27 -- Fundamental rights impact assessment for high-risk AI systems deployed by public bodies, public service providers, and credit/insurance scoring. EUR-Lex

  3. Regulation (EU) 2024/1689, Article 4 -- AI literacy obligation requiring providers and deployers to ensure sufficient AI literacy of staff. EUR-Lex

  4. Twin Ladder Standard v1.0 -- Open methodology (CC BY-SA 4.0) for assessing organisational AI competence across six pillars. Twin Ladder Standard

  5. EU Charter of Fundamental Rights -- Articles 1, 7, 8, 21, 31, 41, 47 -- referenced in Article 27 FRIA requirements for rights mapping. EUR-Lex