Jan 28, 2025

Building a Customer 360 Personalisation Platform

A regional e-commerce platform had four million registered customers, sixty-four million annual transactions, and almost no ability to understand the relationship between them. Customer data lived in five separate systems. No marketer could answer a basic question — “what did this customer do last week?” — without extracting data from multiple systems and joining it in a spreadsheet. Nematix built the unified customer data platform they needed, and the personalisation models that ran on top of it.

The outcome: five months from kickoff to the first model in production, an 18% lift in average order value from personalised recommendations, and a churn prediction model that identified 34% of at-risk customers before they left.

The Situation

The client operated across four Southeast Asian markets, selling across fashion, electronics, and home goods. They had grown through acquisition and organic expansion, and their data infrastructure reflected that history: five systems, each chosen for its own purpose, none designed to talk to the others.

The five systems were: Shopify (order management and product catalogue), Zendesk (customer support), Salesforce (CRM and loyalty programme), a proprietary analytics data warehouse built in-house three years earlier, and a mobile app with its own event tracking pipeline.

Each system had a different primary key for customers. Shopify used email addresses. Zendesk used Zendesk user IDs. Salesforce used a CRM ID generated at account creation. The mobile app used device identifiers. The in-house warehouse had its own synthetic customer ID generated from a combination of email and phone — a scheme that broke when customers changed their email addresses.

The data science team of four had genuine capability — they had built predictive models before — but were running analyses on exported CSV files, spending more time preparing data than building models. They had no infrastructure to productionise anything they built.

The Challenge

Three specific problems made this harder than a standard CDP implementation.

Identity resolution at scale. Unifying 4.1 million customer records across five systems with five different primary keys required deterministic matching (same email, same phone) and probabilistic matching (similar name, overlapping purchase history, same device used on both Shopify web and the mobile app). Getting this wrong would create ghost profiles — customers counted twice — or merge profiles incorrectly, sending the wrong recommendations to the wrong person.

Data freshness asymmetry. The mobile app generated real-time event data (page views, add-to-cart, search queries) while Shopify synced nightly. Mixing real-time signals with day-old purchase data without accounting for the time gap produced misleading conclusions — for example, recommending a product to someone who had purchased it six hours earlier but whose purchase hadn’t yet synced.

PDPA compliance. Personalisation models trained on customer data required demonstrable consent for each data element used. The existing systems had inconsistent consent collection — some customers had consented to marketing use of their data, others had not, and there was no single place to look this up. Building a personalisation engine on data that couldn’t be legally used was not an option.

Our Approach

Weeks 1–4: Data audit and architecture design

We mapped every data element in all five systems, documented the customer identifier scheme in each, and profiled data quality: completeness, freshness, consistency. The audit revealed that 23% of Zendesk records had no email address (support tickets opened via phone), and that the in-house warehouse had 800,000 duplicate records from the broken email-based ID scheme.

The target architecture: a Customer Data Platform on Databricks with Delta Lake as the storage layer, Apache Kafka for real-time event streaming, and a consent management service as a prerequisite to any downstream personalisation use.

Weeks 5–12: Consent management and identity resolution

Before building anything that used customer data for personalisation, we built the consent management layer — a service that queried each source system’s consent records and produced a unified consent profile per customer. Only customers with confirmed marketing consent were eligible for personalisation.

The identity resolution service used a three-tier matching approach: exact match (email or phone), high-confidence probabilistic match (name + location + overlapping purchase history), and low-confidence probabilistic match (flagged for manual review). Of 4.1 million customer records, 3.7 million (90%) were successfully unified into single profiles. Three hundred thousand remained as unmatched records — kept isolated and excluded from personalisation until additional signals were available.

Weeks 13–22: Data platform and feature store

With identity resolution in place, we built the CDP on Databricks: real-time event ingestion via Kafka for mobile and web events, nightly batch ingestion from Shopify, Zendesk, and Salesforce. A unified customer profile updated every 15 minutes. A feature store exposing 140 pre-computed customer features (recency, frequency, monetary value, category affinity, device preference, support history) for model training and serving.

Weeks 20–24: Model development and deployment

Two models were built and deployed:

Product recommendation model: Collaborative filtering using the Alternating Least Squares algorithm, trained on 18 months of purchase and browsing history. Served as a real-time API called by the mobile app and web product pages. Personalised recommendations replaced the static “trending” carousel.

Churn prediction model: Logistic regression on RFM (recency, frequency, monetary) features plus support ticket history and session frequency. Scored weekly for all active customers. Customers above the churn probability threshold were entered into a retention campaign sequence.

Outcome

Metric	Before	After (90 days post-launch)
Customer profiles unified	—	3.7M / 4.1M (90%)
Average order value	Baseline	+18%
Churn prediction recall (30-day window)	—	34% of churned customers identified in advance
Time to run a customer segment analysis	2–3 days	2–4 hours
Data science time on data preparation	~70%	~25%
Models in production	0	2

The personalisation carousel replaced the static trending module on both the mobile app and web. The churn model’s output fed directly into the CRM for retention campaign triggering.

Key Takeaways

Consent is infrastructure, not compliance. Building the consent management layer first — before building any personalisation capability — meant every model that ran on top of it was legally defensible from day one. Retrofitting consent into a working personalisation system is significantly harder and more expensive.

Identity resolution quality determines everything downstream. A personalisation model trained on a customer profile that conflates two different people will produce recommendations for neither. The three-tier matching approach, with explicit handling of low-confidence matches, produced a unified profile that the data science team could trust.

A feature store changes what’s possible. Pre-computing 140 customer features and making them available via a serving API meant the data science team could train and deploy a new model in days rather than weeks. The infrastructure investment paid back immediately in model iteration speed.

This engagement draws on our Data Intelligence & Analytics services. Speak with our team if you’re looking to unify your customer data and activate it.