Jan 20, 2026

Building a GenAI Centre of Excellence

Centralised or federated? Platform team or embedded engineers? Here is the GenAI Centre of Excellence structure that avoids the two most common failures.

The GenAI Centre of Excellence has become the standard answer to “who owns AI in our organisation?” It shows up in every enterprise AI strategy document, every board-level AI briefing, every consulting proposal. And it is often the wrong answer — not because CoEs are a bad idea, but because most organisations build them in one of two ways that reliably fail.

Getting the structure right is not an academic question. The way an organisation structures its GenAI capability determines who gets to build AI deployments, how fast they get to production, and whether the organisation ends up with coherent, governed AI infrastructure or a dispersed collection of shadow deployments that the security team discovers six months after launch.

The Two Failure Modes

The ivory tower CoE is a central team that produces strategy documents, guidelines, proof-of-concepts, and “approved use case” catalogues. It is usually staffed with capable people and well-funded in its first year. It has no operational connection to the business units doing real work. Its deliverables — the frameworks, the approved vendor lists, the use case templates — are technically correct and operationally ignored.

Business units that want to move quickly learn to work around the ivory tower CoE. They engage a vendor directly, build something, and bring it to the CoE for approval after the fact — or not at all. The CoE ends up managing a backlog of pending approvals while the actual AI work happens elsewhere. The organisation has a CoE that does not know about most of the AI systems running in production.

The fractured federated model is the opposite failure. There is no centre. Every business unit does its own AI work: different LLM providers, different prompt management approaches, different evaluation criteria, different security postures. Engineering teams in finance use one vendor, marketing uses another, operations has built something on a third. The legal and data protection agreements — if they exist — are inconsistent. The security team discovers production deployments via a routine network scan. When a data breach occurs, no one can reconstruct what data each system was processing.

The fractured model produces fast initial movement — individual teams are unblocked and can ship quickly — but it creates a governance and security liability that grows with every new deployment. It also produces significant duplicated cost: each business unit negotiates its own vendor contract, builds its own observability tooling, solves the same infrastructure problems independently.

The Structure That Works

The model that consistently produces both speed and governance is a hub-and-spoke structure with clear responsibilities at each level.

The Hub: Central Platform Team

The hub is a small, technically strong central team — typically four to eight people, depending on the organisation’s size and the number of business units being served. Its job is to make it easy to deploy AI correctly, not to deploy AI for everyone else.

The hub’s core responsibilities:

LLM gateway and shared infrastructure: a centralised API gateway that routes requests to LLM providers, enforces rate limits, logs all requests for audit purposes, and provides a single point for cost management and access control. Every business unit deployment goes through this gateway. This gives the organisation complete visibility into what is being sent to which providers.
Vendor relationships and data protection agreements: negotiating enterprise agreements with LLM providers, ensuring DPAs are in place and compliant with Malaysia’s PDPA and any applicable international requirements, and maintaining a vetted vendor list that business units can pull from without starting a procurement process from scratch.
AI governance policy: defining and enforcing the standards that all GenAI deployments must meet — data classification requirements, mandatory human review for specific decision types, prohibited use cases, documentation requirements.
Shared evaluation tooling: a standard framework for evaluating LLM output quality, including automated metrics and human evaluation rubrics. Business units should not need to build this themselves.
AI risk review process: the lightweight approval process for new deployments. Not a committee. A named reviewer, a defined process, a published turnaround time.

The Spokes: Embedded AI Engineers or Champions

Each significant business unit has an embedded AI engineer or AI champion — a person who sits in the business unit, understands the domain deeply, and builds and maintains the unit’s GenAI deployments on top of the central platform.

The spoke’s core responsibilities:

Building and iterating on business-unit-specific deployments using the central platform’s shared infrastructure
Translating domain requirements into technical specifications — understanding the compliance workflow well enough to design a compliance monitoring tool that the compliance team will actually use
Owning operational outcomes: when the deployment behaves unexpectedly, the spoke engineer is the first responder, not the central platform team
Feeding back to the hub: identifying platform gaps, contributing to shared tooling, flagging governance policy edge cases

This is the critical design choice that distinguishes the working model from the ivory tower CoE. The spoke engineers are not submitting requests to the central team and waiting. They are building, with the central platform as their infrastructure. The central team’s job is to keep the platform reliable and the governance framework current — not to review every prompt or approve every deployment detail.

The Platform Team’s Minimum Viable Stack

The hub needs a coherent technical foundation to be useful. The minimum viable platform stack:

LLM gateway: LiteLLM is the most commonly deployed open-source option and supports routing across all major providers (OpenAI, Anthropic, Cohere, Azure OpenAI) behind a unified API. This means business units can switch providers without changing their integration.

Shared vector infrastructure: pgvector if the organisation already has PostgreSQL infrastructure, Qdrant if it wants a dedicated vector database. The key requirement is that business units do not each provision their own.

Observability platform: Langfuse is the current leading open-source option for LLM observability — it captures traces, latency, cost, and allows the attachment of human evaluation scores to individual traces. Without this, you cannot see what is happening in production.

Evaluation framework: a standard approach to measuring output quality — RAGAS for RAG pipelines, custom rubrics for domain-specific tasks, LLM-as-judge for scalable automated evaluation. The framework matters less than the consistency of applying it.

Prompt library: a version-controlled, searchable repository of tested prompts and prompt templates. This prevents each team from solving the same prompting problems independently and ensures that prompt changes are tracked the same way code changes are.

Deployment approval checklist: a one-page document that every deployment must complete before going to production. Four questions: what data does this system process? Who is the operational owner? What is the human review mechanism? How will success be measured?

The Governance Process Without Bureaucracy

The most common objection to centralised AI governance is speed. If every deployment needs approval, deployment velocity slows to the approval process’s pace. This is the right concern applied to the wrong solution.

The AI deployment review should not be a committee process. It should be a named reviewer — a single person, with a deputy — who has authority to approve or return a submission for revision. The turnaround time should be defined and published: 48 hours for straightforward deployments, five business days for deployments processing sensitive personal data or making consequential decisions.

The review is not a technical audit. It is a governance check: does this deployment have a named owner? Does it have a documented human review mechanism for out-of-confidence outputs? Is the data it processes covered by an existing DPA, or does a new DPA need to be executed before deployment?

A deployment that answers all four checklist questions clearly should be approved in 48 hours. A deployment that cannot answer them clearly should be returned — not rejected, returned — with specific questions that need to be resolved. The reviewer is a resource, not a gatekeeper.

Staffing the Hub Versus the Spokes

The skills required in the hub and the spokes are meaningfully different, and conflating them leads to hiring mistakes.

Hub engineers need strong infrastructure and MLOps capabilities: experience deploying and operating distributed systems, understanding of cloud networking and security, ability to write and maintain policy. They need to care about platform reliability and developer experience — the spoke engineers are their customers, and a platform that is difficult to use will be worked around.

Spoke engineers or champions need domain knowledge as much as technical skill. The most effective spoke AI champion in a compliance team is typically someone who started in compliance and developed GenAI skills, not a software engineer who developed compliance knowledge. Domain credibility matters: the compliance team will engage with someone who understands their workflows in a way they will not engage with a technologist who is learning compliance while building the tool.

The CoE’s Job Is to Make the Business Units Successful

The best way to assess whether a GenAI CoE is working is to ask the business units. Do they find it easy to build and deploy GenAI on the central platform? Do they understand the governance requirements and agree they are reasonable? Do they bring the hub team into their planning early, or do they regard the approval process as an obstacle to route around?

When the platform team becomes the bottleneck — when business units are waiting weeks for approvals, or building workarounds because the central platform does not support what they need, or submitting fake deployments to avoid the review process — the CoE has failed. Not because the policy was wrong, but because the implementation made compliance harder than non-compliance.

The CoE’s job is not to own AI for the organisation. It is to make it possible for the business units to own AI responsibly. When it succeeds, the business units are the ones building and operating the deployments, and the hub team is the infrastructure and governance layer that makes it safe for them to do so.

Going GenAI-Native: Lessons from Two Years in Production — What the CoE structure looks like two years into operation, through the lens of organisations that successfully compounded GenAI value.
How to Evaluate a GenAI Vendor in 2026 — Vendor selection and DPA negotiation are among the CoE hub’s first mandates; this framework covers what to require before signing.
Nematix Generative AI Services — See how Nematix supports the design and launch of GenAI Centres of Excellence across enterprise organisations.

Find out how Nematix’s Strategy & Transformation practice can align your technology investments to business outcomes.