Fraud DetectionAnomaly DetectionGraph NetworksReal-Time InferenceKafkaRedisFintech

Real-time fraud detection pipeline (GDP Labs)

GDP Labs (GLAIR.ai) -- Lead ML Engineer (2021 -- 2023) · Lead ML Engineer

A digital payments platform processing high daily transaction volumes faced escalating fraud losses across three distinct attack vectors: account takeover (credential stuffing + session hijacking), synthetic identity fraud (fabricated KYC documents), and merchant collusion (fake transaction inflation for revenue sharing manipulation). Each attack type has a different feature signature and temporal dynamics, requiring a multi-stage detection architecture rather than a single classifier.

Fraud detection in payments requires near-real-time inference: a decision must be issued within the transaction authorization window (typically sub-second). This constraints model complexity -- full graph neural network inference on the entire transaction network is not feasible at authorization time. Additionally, the class imbalance problem is extreme (fraud rate typically < 0.1%), and the adversarial nature of the problem means that fraudsters adapt to model outputs, creating continuous concept drift that invalidates static models within weeks.

Designed a multi-stage pipeline combining three detection layers: (1) rule-based pre-filters for known attack signatures (velocity checks, geolocation anomalies, device fingerprint blacklists) as low-latency first-pass rejection, (2) gradient-boosted classifiers (XGBoost) on engineered transaction features for the statistical anomaly detection layer, (3) graph-based network anomaly detection to identify merchant collusion rings and synthetic identity clusters using structural patterns in the transaction graph.

Near-real-time feature serving implemented via Kafka (event streaming for transaction ingestion) and Redis (low-latency feature store for pre-computed aggregate features: rolling velocity counts, behavioral baselines per account). This decouples feature computation from inference, enabling complex features without violating latency SLAs.

Continuous retraining pipeline on labeled feedback from the fraud operations team (human-in-the-loop label generation from case resolution outcomes). Deployed with monitoring on feature distribution shifts and score distribution drift as early warning signals for concept drift between retraining cycles.

Measurably reduced fraudulent transaction approval rates on digital payment flows, with fraud catch rate improvements directly reducing operational chargeback costs and protecting both end users and platform economics. The multi-stage architecture pattern -- rule-based pre-filter, statistical classifier, graph anomaly detector -- became the internal template for subsequent fraud system designs across the client portfolio.

XGBoostgradient boostinggraph anomaly detectionApache KafkaRedisPythonscikit-learnDockerKubernetesAWS