The Symmetric Information Bottleneck (SIB), an extension of the more familiar Information Bottleneck, is a dimensionality reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the Generalized Symmetric Information Bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the dataset size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that, in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data efficient than independent compression of each of the input variables.
翻译:对称信息瓶颈(Symmetric Information Bottleneck, SIB)作为更常见的信息瓶颈(Information Bottleneck)的扩展,是一种降维技术,能够同时压缩两个随机变量,以保留其压缩版本之间的信息。我们引入广义对称信息瓶颈(Generalized Symmetric Information Bottleneck, GSIB),探讨这种同步压缩的不同成本函数形式,进而研究此类同步压缩所需的数据集规模。通过推导相关损失函数统计波动的界与均方根估计,我们证明:在典型情况下,与逐个压缩变量相比,同步GSIB压缩在实现相同误差时所需的定性数据量更少。我们提出,这体现了一个更普遍的原理——同步压缩比各输入变量独立压缩更具数据效率。