We introduce Smooth InfoMax (SIM), a novel method for self-supervised representation learning that incorporates an interpretability constraint into the learned representations at various depths of the neural network. SIM's architecture is split up into probabilistic modules, each locally optimized using the InfoNCE bound. Inspired by VAEs, the representations from these modules are designed to be samples from Gaussian distributions and are further constrained to be close to the standard normal distribution. This results in a smooth and predictable space, enabling traversal of the latent space through a decoder for easier post-hoc analysis of the learned representations. We evaluate SIM's performance on sequential speech data, showing that it performs competitively with its less interpretable counterpart, Greedy InfoMax (GIM). Moreover, we provide insights into SIM's internal representations, demonstrating that the contained information is less entangled throughout the representation and more concentrated in a smaller subset of the dimensions. This further highlights the improved interpretability of SIM.
翻译:我们提出了平滑信息最大化(SIM),这是一种新颖的自监督表示学习方法,它在神经网络的不同深度将可解释性约束融入学习到的表示中。SIM的架构被分割为多个概率模块,每个模块均使用InfoNCE界进行局部优化。受变分自编码器(VAEs)的启发,这些模块生成的表示被设计为高斯分布的样本,并进一步约束其接近标准正态分布。这形成了一个平滑且可预测的隐空间,使得能够通过解码器遍历该隐空间,从而更便捷地对学习到的表示进行事后分析。我们在序列语音数据上评估了SIM的性能,结果表明其表现与可解释性较弱的对应方法——贪婪信息最大化(GIM)具有竞争力。此外,我们深入分析了SIM的内部表示,证明其中包含的信息在整个表示中的纠缠程度更低,且更集中于维度的较小子集中。这进一步凸显了SIM在可解释性方面的提升。