The total correlation(TC) is a crucial index to measure the correlation between marginal distribution in multidimensional random variables, and it is frequently applied as an inductive bias in representation learning. Previous research has shown that the TC value can be estimated using mutual information boundaries through decomposition. However, we found through theoretical derivation and qualitative experiments that due to the use of importance sampling in the decomposition process, the bias of TC value estimated based on MI bounds will be amplified when the proposal distribution in the sampling differs significantly from the target distribution. To reduce estimation bias issues, we propose a TC estimation correction model based on supervised learning, which uses the training iteration loss sequence of the TC estimator based on MI bounds as input features to output the true TC value. Experiments show that our proposed method can improve the accuracy of TC estimation and eliminate the variance generated by the TC estimation process.
翻译:总相关性(TC)是衡量多维随机变量中边缘分布之间相关性的关键指标,在表示学习中常被用作归纳偏置。先前研究表明,TC值可通过互信息边界分解进行估计。然而,通过理论推导与定性实验发现,由于分解过程中采用了重要性采样方法,当采样中的提议分布与目标分布差异较大时,基于MI界限估计的TC值偏差会被放大。为降低估计偏差问题,我们提出一种基于监督学习的TC估计校正模型,该模型以基于MI界限的TC估计器训练迭代损失序列作为输入特征,输出真实TC值。实验表明,所提方法可提升TC估计精度,并消除TC估计过程中产生的方差。