Generative Ai
How to Evaluate a GenAI Vendor in 2026
Feb 01, 2026

How to Evaluate a GenAI Vendor in 2026

GenAI vendor pitches all sound alike. Here are specific questions on data handling, model updates, and lock-in that separate serious vendors from the rest.


GenAI vendor demos are excellent. They are produced by skilled sales engineers with preselected prompts, curated datasets, and optimised inference configurations. They demonstrate the best-case scenario for their product on data that has been chosen precisely because it performs well.

This is not dishonest. It is what demos are for. The problem is that procurement decisions made on the basis of demos — and only demos — are how organisations end up with expensive multi-year contracts for systems that do not perform on their actual data, in their actual workflows, under their actual compliance requirements.

By 2026, the GenAI vendor landscape has matured significantly. There are genuine differences between vendors on the dimensions that matter for enterprise deployment: data handling, model update policies, compliance certifications, and the practical realities of cost and lock-in. The way to surface those differences is to ask specific questions before you sign — and to require that the answers are backed by contract, not just by policy documentation that the vendor can revise unilaterally.

Here is the framework we use.

Data Handling

Does the vendor use API data to train or improve their models?

This is the most consequential data handling question, and the answer needs to be in the contract, not the privacy policy. Privacy policies are unilateral and changeable. A contract clause prohibiting training use of customer data is a legal commitment. Several major LLM providers have default terms that permit training use — often with an opt-out available at the enterprise tier. Require a contractual prohibition, and confirm that it covers not just the primary provider but any subprocessors they use.

Where is data processed and stored?

Data residency matters for Malaysian organisations on two grounds: Malaysia’s Personal Data Protection Act 2010 imposes restrictions on cross-border personal data transfers, and financial institutions subject to Bank Negara Malaysia oversight have additional data localisation considerations under the RMiT framework. Understand exactly where data goes when an API call is made — the primary data centre, the inference cluster, the logging infrastructure, and the backup systems. “Your data is processed in the Asia Pacific region” is not a sufficient answer.

What is the data retention period for API calls?

When you send a prompt and receive a response, how long does the vendor retain the request and response? For 30 days? 90 days? Indefinitely? This matters for incident response (if a sensitive document was accidentally included in a prompt, what is the window during which it is accessible at the vendor?) and for regulatory audit trails (do you need to retain your own records of AI-processed data, or can you rely on the vendor’s logs?).

Who within the vendor organisation can access our data?

Under what circumstances can vendor employees access your API data? Is this access logged? Is it auditable? Enterprise vendors should be able to answer this precisely, with reference to their internal access control policies and their SOC 2 audit coverage.

Model Update Policy

How are models updated, and with how much notice?

A model update that changes output format, reasoning behaviour, or instruction-following characteristics can break production prompts without any change on your side. For synchronous customer-facing applications, a breaking model change with 48 hours’ notice is a production incident. Acceptable enterprise model update policies include: advance notice of at least 30 days for breaking changes, a documented change log for each model version, and a defined transition period during which the previous version remains available.

Are previous model versions retained and accessible after a model update?

If a model update breaks your prompts, can you fall back to the previous version while you remediate? For how long? Some vendors retire model versions quickly; others maintain them for six to twelve months. Your architecture should account for model version pinning — always specifying the exact model version in production code, never using “latest” aliases — but this only works if the specified version remains available.

What is the vendor’s policy on deprecating model versions?

Model deprecation is distinct from model updates. It is the point at which a model version is permanently removed and you must migrate to a different version. Reasonable enterprise deprecation policies include at least six months of advance notice and a migration guide. Require this in writing before signing.

Compliance and Certifications

SOC 2 Type II

SOC 2 Type I certifies that the vendor’s security controls exist and are designed correctly at a point in time. SOC 2 Type II certifies that those controls operated effectively over a period — typically six to twelve months. For production enterprise use, Type II is the minimum. Ask for the audit report, not just the certification letter, and read the exceptions section: every SOC 2 audit surfaces findings, and the severity and nature of those findings matters.

ISO 27001

ISO 27001 certification covers the vendor’s information security management system. It is a necessary but not sufficient credential — it demonstrates that the vendor has a structured approach to information security, but does not certify specific controls in the way SOC 2 does.

Sector-specific requirements

For Malaysian financial institutions, the relevant frameworks are BNM’s Risk Management in Technology (RMiT) policy document and, for larger institutions, DPTM (Data Protection Trust Mark) alignment. Any vendor that will process data subject to RMiT should be able to demonstrate how their controls align to the framework’s requirements — not necessarily certify against them, but map their controls to the relevant policy requirements.

DPA availability

A GDPR-compliant Data Processing Agreement should be readily available from any serious enterprise vendor — European data protection requirements have effectively set a global floor for enterprise DPA standards. A vendor that cannot produce a DPA promptly is either not enterprise-ready or has not invested in the legal infrastructure that enterprise sales require. For Malaysian personal data specifically, confirm that the DPA’s provisions are compatible with PDPA 2010 requirements.

Uptime and SLAs

What are the SLAs, and what do they cover?

99.9% monthly uptime equates to approximately 43 minutes of allowable downtime per month, or 8.7 hours per year. For a background processing pipeline that runs nightly, this may be acceptable. For a synchronous, customer-facing application, it may not be. Understand the SLA in terms of the workflows it is supporting, not just the number on the page.

Equally important: what does the SLA cover? Many vendor SLAs cover API availability but not response latency. A system that is technically “up” but responding in 30 seconds instead of 2 seconds is operationally down for latency-sensitive applications. Check whether the SLA includes latency commitments and, if so, at what percentile (p95? p99?).

What is the historical uptime record?

Ask for the vendor’s incident history for the past 12 months — not their advertised SLA, their actual record. Most enterprise vendors publish this on a status page. The number of incidents, their duration, and the vendor’s communication behaviour during incidents are all informative.

Cost Predictability

Is pricing per token, per request, or per seat?

Token-based pricing is the most common model for LLM APIs and the most difficult to forecast accurately. An application that performs better than expected — because users engage with it more deeply or submit longer documents — will cost more than budgeted. Build cost models using realistic workload estimates, with a buffer for usage growth.

What are the rate limits at your tier, and what happens when you exceed them?

Hard stops — HTTP 429 errors — cause production failures. Graceful degradation — queuing requests, falling back to a smaller model — requires architectural planning. Know which behaviour your tier produces before you build your application architecture around it.

Is there a cost cap mechanism?

Some providers offer spend limits that hard-stop API access when a monthly budget is exhausted. For cost-controlled deployments, this can be a useful guardrail. For production systems where availability is critical, it is a potential outage trigger. Understand the mechanism and design accordingly.

Lock-In Risk

Can you export your fine-tuned models?

If you invest in fine-tuning a model on a vendor’s platform, can you take the fine-tuned model weights with you if you switch providers? Many providers do not allow this — the fine-tuned model remains on their infrastructure, and the fine-tuning investment does not travel. This is a significant switching cost consideration.

Are your prompts and configurations portable?

Prompts are portable by nature — they are text. But vendor-specific features — function calling formats, system prompt handling, structured output specifications — vary between providers in ways that can require substantial prompt engineering work to migrate. Applications built heavily on vendor-specific features are harder to move.

Is the vendor’s API compatible with the OpenAI API specification?

The OpenAI API format has become a de facto standard, and most major providers now offer OpenAI-compatible endpoints. Applications built against this standard can switch providers by changing a base URL and an API key, with minimal or no code changes. This is a meaningful reduction in switching cost and is worth prioritising when all else is roughly equal.

The Standard a Serious Vendor Should Clear

A vendor who cannot answer these questions clearly — with specific answers, not platitudes — is either not enterprise-ready or is hoping you will not ask. This is useful information to have before you commit.

The answers do not need to be perfect. Some vendors have 30-day model deprecation notice rather than 90 days; some have SOC 2 Type I rather than Type II; some have token-based rather than seat-based pricing. These are trade-offs, and your organisation’s specific requirements will determine which trade-offs are acceptable. What is not acceptable is vagueness. Specific answers you can evaluate are the baseline for a procurement decision you can defend.

  • The Total Cost of GenAI — The full cost structure — API, infrastructure, engineering, and human review — that must be verified and modelled during vendor evaluation.
  • Building a GenAI Centre of Excellence — How vendor evaluation fits into the CoE hub’s responsibilities, including maintaining the vetted vendor list and DPA oversight.
  • Nematix Generative AI Services — See how Nematix guides organisations through vendor selection and contract negotiation for enterprise GenAI deployments.

Find out how Nematix’s Strategy & Transformation practice can align your technology investments to business outcomes.