Detecting distributional drift in high-dimensional data streams presents fundamental challenges: global comparison methods scale poorly, projection-based approaches lose geometric structure, and re-clustering methods suffer from identity instability. This paper introduces Argus, A framework that reconceptualizes drift detection as tracking local statistics over a fixed spatial partition of the data manifold. The key contributions are fourfold. First, it is proved that Voronoi tessellations over canonical orthonormal frames yield drift metrics that are invariant to orthogonal transformations. The rotations and reflections that preserve Euclidean geometry. Second, it is established that this framework achieves O(N) complexity per snapshot while providing cell-level spatial localization of distributional change. Third, a graph-theoretic characterization of drift propagation is developed that distinguishes coherent distributional shifts from isolated perturbations. Fourth, product quantization tessellation is introduced for scaling to very high dimensions (d>500) by decomposing the space into independent subspaces and aggregating drift signals across subspaces. This paper formalizes the theoretical foundations, proves invariance properties, and presents experimental validation demonstrating that the framework correctly identifies drift under coordinate rotation while existing methods produce false positives. The tessellated approach offers a principled geometric foundation for distribution monitoring that preserves high-dimensional structure without the computational burden of pairwise comparisons.
翻译:在高维数据流中检测分布漂移面临若干根本性挑战:全局比较方法扩展性差,基于投影的方法丢失几何结构,而重新聚类方法存在身份不稳定性。本文提出Argus框架,该框架将漂移检测重新概念化为在数据流形的固定空间划分上跟踪局部统计量。其主要贡献有四点。首先,证明了基于规范正交基的Voronoi剖分产生的漂移度量对正交变换(即保持欧几里得几何的旋转和反射)具有不变性。其次,确立了该框架在每次快照分析中达到O(N)复杂度,同时提供分布变化的单元级空间定位能力。第三,开发了漂移传播的图论表征方法,能够区分连贯的分布偏移与孤立扰动。第四,引入乘积量化剖分技术,通过将空间分解为独立子空间并聚合跨子空间的漂移信号,实现向超高维度(d>500)的扩展。本文形式化了理论基础,证明了不变性性质,并通过实验验证表明该框架能在坐标旋转下正确识别漂移,而现有方法会产生误报。这种剖分方法为分布监控提供了原则性的几何基础,既保留了高维结构,又避免了成对比较的计算负担。