ForecastingConformal PredictionXGBoostGCP Vertex AIdbtMLOpsUncertainty Quantification

Share of Voice forecasting system -- Fortune 500 APAC (Artefact)

Artefact -- Senior Data Scientist (Feb 2025 -- Nov 2025) · Lead ML Engineer (Founding Technical Member, Jakarta Office)

A Fortune 500 global beauty conglomerate needed a data-driven system for measuring and forecasting Share of Voice (SoV) -- their relative brand presence across competitor-inclusive media impressions -- across 6 APAC markets, 5 product categories, 4 divisions, and 4 digital platforms (Meta, Instagram, TikTok, YouTube). Media Directors and Chief Media Officers were making over- vs. under-investment decisions by brand and market using manual spreadsheet benchmarking updated monthly with 3-4 week lag.

The forecasting problem is high-dimensional and sparse: the (category x country x division x platform) grain produces hundreds of independent time series with short histories, seasonal patterns driven by campaign cycles, and competitive signal entangled with organic brand momentum. The initial baseline -- a linear regression -- failed to capture non-linear interaction effects between platform algorithm changes and competitor spend surges. More critically, point forecasts were insufficient for the use case: media planners needed calibrated uncertainty ranges to justify budget reallocation decisions to CFOs, not single-number predictions that communicated false precision.

Migrated baseline linear regression to XGBoost after resolving regulatory and stakeholder explainability constraints (SHAP feature importance validated against domain knowledge with client media teams). Integrated third-party competitive intelligence signals from SimilarWeb (web traffic) and Traackr (influencer engagement) alongside platform-native social engagement features via dbt transformation pipelines on BigQuery.

Designed split conformal prediction wrappers on XGBoost outputs to deliver calibrated prediction intervals with coverage guarantees -- replacing point forecasts with plannable uncertainty ranges. Built evaluation pipelines tracking calibration drift and empirical coverage as observable KPIs (target: 90% PI empirical coverage on rolling holdout windows), establishing UQ as a first-class production metric alongside point forecast accuracy.

Full pipeline on GCP Vertex AI Workbench: automated data ingestion scripts, dbt for feature transformation and versioning, MLflow-equivalent experiment tracking, and Streamlit dashboards translating model outputs into stacked market-share visualizations for non-technical commercial leads. Established engineering baseline (code review, reproducibility standards, config management) as sole technical IC -- setting the production ML delivery standard for a newly opened office.

Developed RAG-based insights PoC for business stakeholders; validated output quality through human-in-the-loop evaluation of factual accuracy and relevance; instrumented prompt versioning and response monitoring via LangSmith as part of a structured LLM evaluation pipeline.

Shifted media investment decisions from reactive (monthly lagged reporting) to predictive (calibrated forward-looking uncertainty ranges), directly informing budget allocation decisions by Media Directors and CMOs across 6 APAC markets. Scaled from 1 to 3 countries within 2 months of initial delivery. The conformal prediction framework converted a "model output" into a "decision input" -- the distinction that makes ML commercially durable in enterprise settings.

6 markets 4 platforms 5 categories 4 divisions 40 latency reduction pct 30 decision cycle acceleration pct 90 pi coverage target pct
XGBoostConformal Prediction (split)GCP Vertex AIBigQueryGCSCloud WorkstationdbtPythonSQLSimilarWebTraackrStreamlitLangChainLangSmithRAGSplit conformal predictioncalibration drift monitoring