Sliced Mutual Information (SMI) is widely used as a scalable alternative to mutual information for measuring non-linear statistical dependence. Despite its advantages, such as faster convergence, robustness to high dimensionality, and nullification only under statistical independence, we demonstrate that SMI is highly susceptible to data manipulation and exhibits counterintuitive behavior. Through extensive benchmarking and theoretical analysis, we show that SMI saturates easily, fails to detect increases in statistical dependence (even under linear transformations designed to enhance the extraction of information), prioritizes redundancy over informative content, and in some cases, performs worse than simpler dependence measures like the correlation coefficient.
翻译:切片互信息(SMI)作为一种可扩展的互信息替代方法,被广泛用于衡量非线性统计依赖性。尽管SMI具有收敛速度更快、对高维数据具有鲁棒性以及仅在统计独立时为零等优点,但我们证明SMI极易受到数据操纵的影响,并表现出反直觉的行为。通过广泛的基准测试和理论分析,我们发现SMI容易达到饱和、无法检测统计依赖性的增强(即使在旨在提升信息提取能力的线性变换下也是如此)、优先考虑冗余而非信息内容,并且在某些情况下,其表现甚至比相关系数等更简单的依赖性度量方法更差。