Information bottleneck (IB) is a paradigm to extract information in one target random variable from another relevant random variable, which has aroused great interest due to its potential to explain deep neural networks in terms of information compression and prediction. Despite its great importance, finding the optimal bottleneck variable involves a difficult nonconvex optimization problem due to the nonconvexity of mutual information constraint. The Blahut-Arimoto algorithm and its variants provide an approach by considering its Lagrangian with fixed Lagrange multiplier. However, only the strictly concave IB curve can be fully obtained by the BA algorithm, which strongly limits its application in machine learning and related fields, as strict concavity cannot be guaranteed in those problems. To overcome the above difficulty, we derive an entropy regularized optimal transport (OT) model for IB problem from a posterior probability perspective. Correspondingly, we use the alternating optimization procedure and generalize the Sinkhorn algorithm to solve the above OT model. The effectiveness and efficiency of our approach are demonstrated via numerical experiments.
翻译:信息瓶颈(IB)是一种从某一相关随机变量中提取目标随机变量信息的范式,因其在信息压缩与预测方面解释深度神经网络的潜力而受到广泛关注。尽管其重要性显著,但由于互信息约束的非凸性,寻找最优瓶颈变量涉及一个困难非凸优化问题。Blahut-Arimoto算法及其变体通过考虑具有固定拉格朗日乘子的拉格朗日函数提供了求解思路,但仅严格凹的IB曲线可通过BA算法完整获取,这严重限制了其在机器学习及相关领域的应用——因为这些问题的严格凹性无法保证。为克服上述困难,我们从后验概率视角推导了IB问题的一个熵正则化最优传输(OT)模型。相应地,采用交替优化过程并泛化Sinkhorn算法来求解上述OT模型。数值实验验证了所提方法的有效性与高效性。