This paper presents a novel approach to measuring statistical dependence between two random processes (r.p.) using a positive-definite function called the Normalized Cross Density (NCD). NCD is derived directly from the probability density functions of two r.p. and constructs a data-dependent Hilbert space, the Normalized Cross-Density Hilbert Space (NCD-HS). By Mercer's Theorem, the NCD norm can be decomposed into its eigenspectrum, which we name the Multivariate Statistical Dependence (MSD) measure, and their sum, the Total Dependence Measure (TSD). Hence, the NCD-HS eigenfunctions serve as a novel embedded feature space, suitable for quantifying r.p. statistical dependence. In order to apply NCD directly to r.p. realizations, we introduce an architecture with two multiple-output neural networks, a cost function, and an algorithm named the Functional Maximal Correlation Algorithm (FMCA). With FMCA, the two networks learn concurrently by approximating each other's outputs, extending the Alternating Conditional Expectation (ACE) for multivariate functions. We mathematically prove that FMCA learns the dominant eigenvalues and eigenfunctions of NCD directly from realizations. Preliminary results with synthetic data and medium-sized image datasets corroborate the theory. Different strategies for applying NCD are proposed and discussed, demonstrating the method's versatility and stability beyond supervised learning. Specifically, when the two r.p. are high-dimensional real-world images and a white uniform noise process, FMCA learns factorial codes, i.e., the occurrence of a code guarantees that a specific training set image was present, which is important for feature learning.
翻译:本文提出一种新的方法,通过称为归一化互密度(NCD)的正定函数来衡量两个随机过程(r.p.)之间的统计依赖性。NCD直接从两个随机过程的概率密度函数导出,并构造一个数据依赖的希尔伯特空间——归一化互密度希尔伯特空间(NCD-HS)。根据Mercer定理,NCD范数可分解为特征谱,我们将其命名为多元统计依赖性度量(MSD),其特征值和称为总依赖性度量(TSD)。因此,NCD-HS特征函数构成一种新型嵌入式特征空间,适用于量化随机过程的统计依赖性。为将NCD直接应用于随机过程实现,我们引入一个包含两个多输出神经网络、一个代价函数以及名为函数最大相关性算法(FMCA)的架构。通过FMCA,两个网络通过相互逼近对方输出进行协同学习,扩展了针对多元函数的高斯-牛顿交替条件期望(ACE)算法。我们从数学上证明FMCA能直接从随机过程实现中学到NCD的主导特征值和特征函数。针对合成数据和中规模图像数据集的初步结果验证了该理论。我们提出并讨论了NCD的不同应用策略,展示了该方法在监督学习之外的通用性和稳定性。特别地,当两个随机过程分别对应高维真实世界图像与均匀白噪声过程时,FMCA可学习到阶乘编码——即编码的出现确保特定训练图像的存在,这对特征学习具有重要意义。