VSCOUT: A Hybrid Variational Autoencoder Approach to Outlier Detection in High-Dimensional Retrospective Monitoring

Modern industrial and service processes generate high-dimensional, non-Gaussian, and contamination-prone data that challenge the foundational assumptions of classical Statistical Process Control (SPC). Heavy tails, multimodality, nonlinear dependencies, and sparse special-cause observations can distort baseline estimation, mask true anomalies, and prevent reliable identification of an in-control (IC) reference set. To address these challenges, we introduce VSCOUT, a distribution-free framework designed specifically for retrospective (Phase I) monitoring in high-dimensional settings. VSCOUT combines an Automatic Relevance Determination Variational Autoencoder (ARD-VAE) architecture with ensemble-based latent outlier filtering and changepoint detection. The ARD prior isolates the most informative latent dimensions, while the ensemble and changepoint filters identify pointwise and structural contamination within the determined latent space. A second-stage retraining step removes flagged observations and re-estimates the latent structure using only the retained inliers, mitigating masking and stabilizing the IC latent manifold. This two-stage refinement produces a clean and reliable IC baseline suitable for subsequent Phase II deployment. Extensive experiments across benchmark datasets demonstrate that VSCOUT achieves superior sensitivity to special-cause structure while maintaining controlled false alarms, outperforming classical SPC procedures, robust estimators, and modern machine-learning baselines. Its scalability, distributional flexibility, and resilience to complex contamination patterns position VSCOUT as a practical and effective method for retrospective modeling and anomaly detection in AI-enabled environments.

翻译：现代工业和服务过程产生的高维、非高斯且易受污染的数据，对经典统计过程控制（SPC）的基本假设构成了挑战。重尾分布、多模态、非线性依赖以及稀疏的特殊原因观测可能扭曲基线估计、掩盖真实异常，并阻碍可靠地识别受控（IC）参考集。为应对这些挑战，我们提出了VSCOUT，一个专为高维环境下的回顾性（第一阶段）监测设计的无分布框架。VSCOUT结合了自动相关性确定变分自编码器（ARD-VAE）架构与基于集成的潜在异常值过滤及变点检测。ARD先验分离出信息量最大的潜在维度，而集成和变点过滤器则在确定的潜在空间内识别逐点污染和结构性污染。第二阶段的重训练步骤移除标记的观测值，并仅使用保留的内点重新估计潜在结构，从而减轻掩盖效应并稳定IC潜在流形。这种两阶段优化产生了一个干净可靠的IC基线，适用于后续的第二阶段部署。在多个基准数据集上的广泛实验表明，VSCOUT在对特殊原因结构保持高灵敏度的同时，能控制误报率，其性能优于经典SPC方法、鲁棒估计器以及现代机器学习基线。其可扩展性、分布灵活性以及对复杂污染模式的鲁棒性，使VSCOUT成为人工智能赋能环境中回顾性建模和异常检测的一种实用且有效的方法。