Jun 01, 2025

GenAI in Healthcare: Clinical Documentation and Coding

GenAI is transforming clinical documentation and medical coding. The safety threshold in healthcare differs from every other industry. Here is what works.

A clinician in a Malaysian public or private hospital spends roughly 30 to 40% of their working time on documentation. Clinical notes, discharge summaries, referral letters, progress notes, medication reconciliation, operative reports — the administrative burden of a clinical role is not a minor inconvenience. It is a material portion of a working life, performed largely after the clinical encounter itself, in the hours when a physician might otherwise be seeing additional patients or resting.

GenAI can reduce that burden significantly. The technology exists and is in use in healthcare settings across Malaysia, Singapore, and the broader region. Whether a specific deployment is safe depends not on the technology but on where in the clinical workflow it sits — on which side of the line that separates documentation and coding from diagnosis and treatment.

That line is the structuring principle for every GenAI deployment decision in healthcare.

Use Case 1: Clinical Documentation — Ambient Scribing

The most impactful GenAI application in clinical settings, and the one with the clearest safety profile, is ambient scribing: recording the patient-clinician encounter and generating a structured clinical note for physician review.

The workflow is straightforward. A microphone — typically a dedicated device or a mobile application — captures the audio of a consultation or ward round. An automated speech recognition (ASR) layer transcribes the audio. A language model processes the transcript to generate a structured clinical note in the format required by the institution: subjective, objective, assessment, plan (SOAP); problem-oriented medical records (POMR); or a custom template. The physician reviews the generated note, makes corrections and additions, and signs it as the official record.

Several private hospital groups in Malaysia and Singapore have piloted and deployed this workflow. The documented efficiency gains are consistent: physicians report spending 40 to 60% less time on note documentation per patient encounter. For a clinician seeing 20 to 30 patients per day, this translates to one to two hours reclaimed daily — time that can go to additional patient care, teaching, research, or recovery.

The safety profile is acceptable because the physician reviews every note before it enters the medical record. The LLM is not making clinical decisions; it is generating a structured representation of a conversation that a physician then owns. If the generated note contains an error — a medication mis-transcribed, a clinical finding incorrectly characterised — the reviewing physician catches it. The safety constraint is the physician’s review, and that review must be mandatory, not optional.

The practical implementation considerations are not trivial. Audio quality varies significantly between a quiet consulting room and a busy ward round with background noise and multiple speakers. Transcription accuracy for Malaysian clinical English — which includes code-switching, abbreviations, and local medication brand names — requires validation on representative audio samples, not just benchmark performance on standardised tests. Clinician adoption is the most common failure mode: if the review interface is cumbersome or the generated note quality is inconsistent, physicians revert to manual documentation.

Privacy is a foundational constraint. Patient audio recordings are sensitive personal health data under the Personal Data Protection Act 2010 and subject to additional Ministry of Health data governance requirements. The audio capture, transmission, transcription, and storage must comply with data residency requirements. Sending patient audio to third-party cloud APIs located outside Malaysia requires explicit contractual and technical controls, and in many institutional settings is not permissible without specific DPA authorisation. On-premise or private cloud deployment of the ASR and language model components is the appropriate architecture for most Malaysian healthcare institutions.

Use Case 2: Medical Coding Assistance

Medical coding — the translation of clinical documentation into ICD-10, CPT, and procedure codes for billing and epidemiological purposes — is a specialised function that sits between clinical and administrative operations. A clinical coder reads a physician’s notes, operative reports, and discharge summaries and assigns the appropriate code set. The work is skilled, consequential for billing accuracy, and volume-intensive.

GenAI-assisted coding presents the LLM with the clinical documentation and requests code suggestions, with confidence scores, for the coder’s review. The coder reviews the suggestions, applies professional judgment, and submits the final code set. The LLM does not submit codes; the coder does.

Documented deployments in the US and Australia report 30 to 50% reductions in per-encounter coding time with this approach, with maintained or improved accuracy when coder review is maintained. The coding profession does not disappear — the coder’s role shifts from generating code sets to reviewing and validating suggestions, which requires the same level of domain expertise but at higher throughput.

In the Malaysian context, the transition from ICD-10 to ICD-11 represents both a challenge and an opportunity for AI-assisted coding. ICD-11 has a significantly expanded code set and a more complex hierarchical structure. Coders trained on ICD-10 face a substantial learning curve. AI-assisted coding tools that are trained on ICD-11 can serve as a reference layer during the transition, reducing the per-encounter effort of applying an unfamiliar code set while coders build familiarity.

The compliance consideration in Malaysia is the relationship between coding accuracy and the National Health Insurance claims process, including claims to insurers and managed care organisations. Incorrect coding affects reimbursement and creates audit exposure. Deploying AI-assisted coding without validating accuracy on Malaysian clinical documentation — which may differ from the US or UK training data of most commercial coding tools — introduces risk that must be assessed before production deployment.

Use Case 3: Patient Communication Drafting

Discharge summaries, appointment reminders, medication instruction sheets, post-procedure care instructions, and health education materials require clear, accessible communication — in plain language, often in both English and Bahasa Malaysia, calibrated to the likely literacy level of the recipient.

Generating these documents manually for every patient is time-consuming. Generating them with a template-plus-merge approach produces text that is accurate but impersonal and sometimes difficult to parse. LLM-assisted drafting offers a middle path: the clinical data (diagnosis, procedure, medication list, follow-up instructions) feeds into a structured prompt, and the LLM generates a patient-appropriate communication that a clinical staff member reviews before sending.

The risk profile is low for most patient communication use cases — lower than for clinical notes, because the output does not directly affect clinical decisions. A discharge summary that is less than perfectly written creates a communication difficulty; a clinical note with an error can affect subsequent clinical management. The review step remains important, but the stakes per document are lower.

Bilingual generation is one of the areas where this use case is particularly valuable in Malaysia. Generating a medication instruction sheet in both English and Bahasa Malaysia requires either two separate drafting efforts or a translation step. An LLM that generates both in a single pass, calibrated to plain language in both, reduces that effort substantially. Translation quality for healthcare vocabulary in BM requires validation — medical terms and medication names have standard BM translations that must be used consistently — but the efficiency gain for high-volume patient education materials is real.

Use Case 4: Clinical Decision Support — Where the Line Is

The previous three use cases sit on one side of a clear boundary: the LLM generates or structures information, and a qualified clinician reviews and acts. Clinical decision support — using an LLM to suggest diagnoses, recommend treatments, or assess patient risk — sits on the other side of that boundary, and the distinction matters profoundly.

Clinical decision support tools are regulated as Software as a Medical Device (SaMD) under Malaysia’s Medical Device Authority (MDA), under the Medical Device Act 2012 and the Medical Device Regulations 2012. The classification of a specific tool depends on its intended use and the clinical risk level: a tool that suggests diagnoses or guides treatment decisions in higher-risk scenarios is classified as a higher-risk medical device and requires product registration, clinical evidence of safety and performance, and ongoing post-market surveillance.

Beyond the regulatory framework, the technical performance of general-purpose LLMs on clinical decision support tasks is not at a level that justifies deployment without extensive validation in specific clinical contexts. LLM hallucination — producing confident, plausible-sounding but incorrect clinical content — is not an acceptable failure mode when the output influences a diagnosis or treatment. A study published in JAMA Internal Medicine in 2023 found that while LLMs performed comparably to physicians on standardised clinical knowledge tests, performance on real clinical vignettes — particularly rare presentations, complex multimorbidity, and atypical symptom patterns — was substantially less reliable. The gap between benchmark performance and real-world clinical decision accuracy is the risk that makes autonomous or semi-autonomous clinical decision support a different risk category from documentation and coding.

This does not mean clinical decision support is off the table. It means the development and deployment pathway is longer, the validation requirements are more stringent, and the oversight model is more conservative. Purpose-built clinical AI systems — validated on Malaysian patient populations, registered with the MDA, with ongoing monitoring of clinical outcomes — are the appropriate architecture for this use case. General-purpose LLMs deployed without this infrastructure are not.

The Malaysian Regulatory Context

The MDA’s approach to AI-based medical devices follows international frameworks, including the International Medical Device Regulators Forum (IMDRF) guidance on SaMD and the FDA’s published frameworks on AI/ML-based medical devices. The key regulatory considerations for any AI system used in clinical settings in Malaysia are:

Registration: software that meets the definition of a medical device under the Medical Device Act 2012 requires registration with the MDA before it can be legally supplied. The registration pathway depends on the device risk classification.

Clinical evidence: for Class C and Class D medical devices (medium-high and high risk), clinical evidence of safety and performance is required. For AI systems used in diagnostic or treatment contexts, this typically means clinical studies on relevant patient populations, not just benchmark results.

Post-market surveillance: registered medical devices must have post-market surveillance systems in place. For AI systems, this includes monitoring of real-world clinical performance and a mechanism for detecting performance degradation or unexpected adverse outcomes.

The practical implication is that GenAI tools used for ambient scribing, coding assistance, and patient communication — which do not directly affect clinical decisions — generally fall outside the SaMD definition or in lower risk classifications. Tools used for diagnosis, treatment recommendation, or clinical risk stratification are likely to require registration. Legal and regulatory counsel should be engaged for any deployment that sits near the boundary.

Data Privacy in Healthcare Settings

Patient health information is among the most sensitive personal data in any jurisdiction. In Malaysia, the Personal Data Protection Act 2010 applies to the processing of personal data by commercial entities, including private hospitals. Additional Ministry of Health data governance frameworks apply to public sector healthcare institutions. Specific requirements govern the processing of sensitive personal data, of which health data is an explicit category.

The implications for GenAI deployment: patient data used to train, fine-tune, or ground LLM outputs must be handled under appropriate consent and legal authority. Patient audio recordings used for ambient scribing require explicit consent, documented in the medical record. Data retention for AI-processed health data must align with the relevant retention schedules. Data residency — keeping patient health data within Malaysia — is a practical requirement for many institutions and a regulatory expectation in the public sector.

Third-party LLM API providers whose infrastructure is located outside Malaysia present data residency challenges. Contractual controls — data processing agreements, subprocessor agreements, geographic restriction clauses — may be sufficient for some use cases and insufficient for others. Private cloud or on-premise deployment of open-source models eliminates the data residency concern by keeping patient data within institutional infrastructure.

The Productive Use of GenAI in Healthcare

The healthcare GenAI deployments that are delivering genuine value in Malaysia and the broader region share a common characteristic: they are on the documentation and administrative side of clinical work, with a qualified clinician or coder reviewing every output before it affects patient care or billing.

The opportunity is real. Reducing the documentation burden on clinicians — which is both a productivity issue and a clinician wellbeing issue — is a meaningful outcome. Improving coding accuracy and throughput reduces administrative waste. Generating clearer patient communications improves health literacy and care adherence.

The constraint is not the technology. Current LLMs are capable of producing useful clinical documentation drafts, relevant code suggestions, and clear patient education materials. The constraint is knowing which side of the clinical decision boundary a given use case sits on — and building the deployment accordingly.

The ambient scribing tool that saves a physician two hours per day is a valuable, deployable system today. The diagnostic assistant that suggests a differential diagnosis from a patient history is a system that requires a different level of regulatory and clinical validation before it should be near a patient. Knowing the difference is the prerequisite for making good decisions about where and how to deploy.

Building Responsible AI Policies for Your Organisation — A governance framework that every healthcare organisation should have in place before deploying GenAI in clinical or administrative workflows.
Evaluating LLM Output in Production — Quality evaluation methods that are especially critical when LLM output feeds into clinical documentation or coding systems.
Nematix Generative AI Services — See how Nematix helps healthcare organisations build GenAI systems that stay on the right side of the clinical decision boundary.

Learn how Nematix’s Innovation Engineering services help businesses build production-ready AI systems.