Standard Mean-Shift algorithms are notoriously sensitive to the bandwidth hyperparameter, particularly in data-scarce regimes where fixed-scale density estimation leads to fragmentation and spurious modes. In this paper, we propose Doubly Stochastic Mean-Shift (DSMS), a novel extension that introduces randomness not only in the trajectory updates but also in the kernel bandwidth itself. By drawing both the data samples and the radius from a continuous uniform distribution at each iteration, DSMS effectively performs a better exploration of the density landscape. We show that this randomized bandwidth policy acts as an implicit regularization mechanism, and provide convergence theoretical results. Comparative experiments on synthetic Gaussian mixtures reveal that DSMS significantly outperforms standard and stochastic Mean-Shift baselines, exhibiting remarkable stability and preventing over-segmentation in sparse clustering scenarios without other performance degradation.
翻译:标准均值漂移算法对带宽超参数极为敏感,尤其在数据稀缺场景中,固定尺度的密度估计会导致聚类碎片化和伪模态的出现。本文提出双随机均值漂移算法,该创新性扩展不仅在轨迹更新中引入随机性,更在核带宽本身引入随机机制。通过在每次迭代中从连续均匀分布中同时抽取数据样本和半径参数,DSMS能够更有效地探索密度分布格局。我们证明这种随机带宽策略可视为隐式正则化机制,并提供了收敛性理论结果。在高斯混合合成数据集上的对比实验表明,DSMS显著优于标准及随机均值漂移基线算法,在稀疏聚类场景中展现出卓越的稳定性,能够有效防止过分割现象且不产生其他性能损失。