GenAI and PDPA Compliance in Malaysia
Sending customer data to an LLM API triggers PDPA obligations. Malaysian businesses must address this before using OpenAI or Anthropic on personal data.
When a Malaysian company sends a customer’s personal data to an LLM API — OpenAI in the United States, Anthropic in the United States, Google’s infrastructure wherever it happens to be processing — that is a transfer of personal data to a third party in a foreign jurisdiction. Under Malaysia’s Personal Data Protection Act 2010 (PDPA), that transfer carries obligations. Most teams doing this routinely have not thought through those obligations.
This is not a theoretical concern. The PDPA applies to any person who processes personal data in the course of a commercial transaction. If your team is sending customer names, contact details, transaction histories, complaint records, or any other personal data to an LLM API to summarise, classify, draft responses to, or otherwise process, you are processing personal data through a third-party system in an overseas jurisdiction. The question is not whether this triggers PDPA obligations — it does. The question is whether your organisation has met them.
The Cross-Border Transfer Obligation
Section 129 of the PDPA restricts the transfer of personal data outside Malaysia to countries not specified in an approved list. The mechanism mirrors the approach taken by the EU’s GDPR and the UK’s Data Protection Act: a whitelist of jurisdictions deemed to provide adequate protection, with alternative safeguards required for transfers to jurisdictions not on the list.
The practical reality is that the Personal Data Protection Commissioner has not gazetted a comprehensive approved-countries list, and enforcement of the territorial transfer restriction has been limited to date. This does not mean the obligation does not exist. It means it has not been actively tested. The 2024 amendment to the PDPA strengthened enforcement powers and increased penalties significantly; the conditions for active enforcement of the cross-border transfer provisions are better in 2026 than they were in 2020.
The United States, where OpenAI and Anthropic are headquartered, is not on any Malaysian approved-countries list. The European Union’s major cloud infrastructure regions are similarly not specifically gazetted. The prudent approach — which is also the approach that will hold up under regulatory scrutiny if a data incident occurs — is to treat any LLM API call that transmits personal data as a cross-border transfer requiring appropriate safeguards, regardless of whether active enforcement has historically been applied.
Appropriate safeguards for cross-border transfers under PDPA include: obtaining the consent of the data subject to the specific overseas transfer, putting in place contractual protections requiring the overseas recipient to protect the data to the PDPA’s standards, or satisfying the Commissioner that adequate protection is provided. Of these, the contractual route — a binding data processing agreement with the LLM API provider — is the most practical for an organisation processing customer data at scale.
Lawful Basis for Processing
Before the cross-border transfer question, there is a more fundamental one: what is the lawful basis for processing your customers’ personal data through an LLM system at all?
The PDPA’s General Principle requires consent for the processing of personal data unless a specified exemption applies. The processing exemptions most commonly relevant to commercial AI applications are: processing necessary for the performance of a contract with the data subject, processing necessary to comply with a legal obligation, and processing for the legitimate interests of the data user or a third party.
Consent as a lawful basis for GenAI processing is often impractical for existing customer relationships — you cannot retrospectively obtain specific consent for GenAI processing from your entire customer base. For new customer relationships, collecting consent at the point of onboarding is feasible, but the consent must be specific: it must clearly describe that personal data will be processed using AI systems, including third-party AI services, for the purposes stated.
Legitimate interest is the most practical lawful basis for most commercial GenAI processing. The legitimate interest exemption permits processing without consent where the data user has a legitimate interest, the processing is necessary for that interest, and those interests are not overridden by the interests, rights, or freedoms of the data subject. The critical step is documenting the legitimate interest assessment: writing down the interest pursued, why processing through an LLM is necessary to achieve it, and why the data subject’s interests do not override it in this context. This documentation is what evidences a lawful basis to a regulator.
Data Processing Agreements with LLM Providers
Before sending any personal data to a third-party LLM API, a Data Processing Agreement (DPA) must be in place with that provider. A DPA is a binding contractual instrument governing what the processor — the LLM API provider — is authorised to do with personal data it processes on your behalf.
The minimum content of a PDPA-adequate DPA includes: a specification of what processing the provider is authorised to perform, an explicit prohibition on using the personal data for any purpose other than providing the contracted service — specifically including prohibition on using your data to train or improve the provider’s models, obligations to notify you within a defined timeframe (aligned with PDPA’s 72-hour breach notification window) if a data breach occurs involving your data, controls on sub-processors — the provider’s own third-party suppliers — including a requirement to impose equivalent obligations on any sub-processor, and deletion or return of data on termination of the service.
OpenAI, Anthropic, and Google all offer DPA documents for their API services. These are not automatically agreed to when you sign up for API access — they typically require explicit agreement, and in some cases a specific enterprise contract tier. Review what you have agreed to before treating an existing API relationship as DPA-covered.
Pay particular attention to the model training clause. The default terms for consumer-facing products often permit use of inputs and outputs for model improvement. The API terms for commercial customers are generally more protective — but “generally” is not a contractual guarantee. Read the relevant clause and ensure it explicitly prohibits use of your data for training. If the existing DPA does not contain this prohibition, raise it with the provider before you process personal data through their system.
The Model Training Risk
The most significant PDPA concern specific to GenAI — and one that is frequently overlooked by teams moving quickly to deploy AI tools — is the risk that personal data sent to an LLM API is used to train or improve the provider’s models.
If a provider uses your customers’ personal data to improve their commercial AI model, that data has been used for a purpose that was almost certainly not disclosed to your customers at the point of data collection. Under PDPA, processing personal data for a purpose beyond the original collection purpose requires either a new consent or a legitimate basis for the extended processing. The provider’s commercial interest in improving their model does not constitute a legitimate interest that overrides your customers’ data protection rights.
The model training risk also has a secondary dimension: if personal data is incorporated into a model’s training data, it may influence that model’s outputs in ways that are difficult or impossible to audit, and the data cannot be meaningfully deleted from the model once training is complete. This creates a data retention obligation that cannot be fulfilled through conventional deletion.
The contractual control is the primary safeguard: a DPA that explicitly prohibits training use, with audit rights. The technical control is data minimisation — sending as little personal data as possible to the API, which limits the exposure if the contractual protection is not honoured or is insufficient.
Data Minimisation in Practice
Data minimisation is one of the PDPA’s seven data protection principles, and it is directly actionable in GenAI system design.
The principle is straightforward: only process the personal data that is actually necessary for the task at hand. For GenAI applications, this translates to a specific design requirement: before the data is assembled into a prompt or passed to an LLM API, strip or mask any personal data fields that are not necessary for the model to perform its function.
A worked example: a customer service GenAI application that summarises support ticket history to help an agent respond to an inbound contact needs the ticket content, the issue category, and the resolution history. It does not need the customer’s NRIC number, date of birth, or residential address. Those fields should be stripped from the data before it is assembled into the LLM context, not passed along by default because they happen to exist in the customer record.
This is both a compliance requirement and a risk management decision. A data minimisation failure — passing more personal data to an LLM API than necessary — increases the severity of a potential breach, the difficulty of subject access request compliance, and the surface area for training data misuse. The technical cost of implementing data minimisation at the pipeline stage is low. The compliance and risk cost of not doing so is high.
On-Premise and Private Cloud Alternatives
For organisations with high sensitivity to cross-border transfer risk — Malaysian financial institutions with strict data localisation requirements under BNM’s RMiT policy, government-linked organisations handling sensitive citizen data, healthcare providers processing medical records — open-source LLMs deployed on Malaysian or Singapore cloud infrastructure offer a substantively different risk profile.
Models in the Llama 3 family, Mistral, and Qwen are production-quality open-source models that can be deployed on infrastructure you control. Deployment on Microsoft Azure Malaysia (Kuala Lumpur data centre), AWS Asia Pacific (Singapore or the forthcoming Malaysia regions), or Google Cloud Asia Southeast eliminates the cross-border transfer concern entirely: the data never leaves the jurisdiction you specify.
The tradeoffs are real. Operating an LLM on your own infrastructure requires engineering capability — model serving, hardware provisioning, operational monitoring, model update management — that a managed API service abstracts away. Open-source models are generally behind the frontier closed models on complex reasoning tasks, though the gap has closed significantly since 2023. For many business use cases — document summarisation, classification, extraction, structured data generation — the open-source models are entirely adequate.
The decision framework is driven by risk profile and use case complexity. High-sensitivity data, regulated use cases with strict localisation requirements, or organisations with strong existing infrastructure capability: the on-premise or private cloud route is worth the operational investment. Lower-sensitivity data, standard business productivity use cases, or organisations without infrastructure capability: managed APIs with appropriate DPAs are the practical choice.
Doing This Right Is Not Prohibitively Complex
PDPA compliance for GenAI is achievable. The steps are not exotic, and they do not require prohibitive time or investment. They require deliberate attention before processing begins rather than after a problem occurs.
The checklist: establish a documented lawful basis for each category of personal data you process through a GenAI system; execute a DPA with every LLM API provider before sending personal data through their system; verify that the DPA explicitly prohibits model training use; implement data minimisation at the pipeline stage — strip fields that are not necessary for the task; configure audit logging for inputs and outputs with defined retention periods aligned with PDPA requirements; build a breach notification procedure that can meet the 72-hour notification window; and review the whole picture annually and when your AI stack materially changes.
The organisations we see making avoidable mistakes on PDPA compliance for GenAI are the ones that moved fast on the technology decision — choosing the LLM, building the application — and treated data governance as a follow-on task. Data governance is not a follow-on task for a system processing personal data. It is a prerequisite. The API call should not happen until the DPA is signed and the lawful basis is documented.
Related Reading
- Malaysia AI Governance: PDPA, MyDIGITAL, and What’s Next — The broader governance landscape that PDPA sits within, including BNM guidance and the anticipated Malaysian AI Act consultation.
- Automating KYC Document Processing with GenAI — A high-PDPA-risk use case examined in detail, showing how data minimisation and DPA requirements apply in practice to real KYC workflows.
- Nematix Generative AI Services — See how Nematix builds GenAI systems with PDPA compliance designed in from the start, not retrofitted after deployment.
Find out how Nematix’s Strategy & Transformation practice can align your technology investments to business outcomes.