We show that if the conditional distribution p(C | T) factors through a sufficient statistic φ(T), then the Information Bottleneck (IB) problem for (T, C) is exactly equivalent to the IB problem for (φ(T), C). The reduction is loss-free: it preserves the full IB curve, the Lagrangian optimum at every trade-off parameter \b{eta}, and the optimal representations up to pullback through φ. As a result, the computational complexity of solving the IB problem is governed by the dimension of the sufficient statistic rather than the ambient dimension of the source. This identifies an exact structural condition under which the generic IB problem becomes tractable, and gives a formal bridge between the discrete and linear-Gaussian regimes. We then show that the classical Gaussian IB solution of Chechik, Globerson, Tishby and Weiss is an immediate corollary of this reduction, and we state a nonlinear-Gaussian generalisation. A small numerical example illustrates the practical consequence: when a low-dimensional sufficient statistic is available, the exact IB curve can be computed on the reduced problem at a cost determined by the statistic rather than by the ambient source dimension.
翻译:本文证明:若条件分布p(C | T)可通过充分统计量φ(T)进行因子分解,则(T, C)的信息瓶颈问题与(φ(T), C)的信息瓶颈问题严格等价。该约化过程无信息损失:完整保留原始IB曲线、每个权衡参数β下的拉格朗日最优解,以及通过φ拉回的最优表征。由此,求解信息瓶颈问题的计算复杂度由充分统计量的维度决定,而非源数据的空间维度。这一发现揭示了通用信息瓶颈问题可解性的精确结构条件,并建立了离散域与线性高斯域之间的形式化桥梁。进一步证明,Chechik、Globerson、Tishby与Weiss的经典高斯信息瓶颈解是该约化过程的直接推论,并由此提出非线性高斯推广。小型数值实验展示了实际效果:当存在低维充分统计量时,可在约化后的问题上以统计量维度(而非源空间维度)确定的计算代价精确计算原始IB曲线。