This paper proposes an information-theoretic representation learning framework, named conditional information flow maximization, to extract noise-invariant sufficient representations for the input data and target task. It promotes the learned representations have good feature uniformity and sufficient predictive ability, which can enhance the generalization of pre-trained language models (PLMs) for the target task. Firstly, an information flow maximization principle is proposed to learn more sufficient representations for the input and target by simultaneously maximizing both input-representation and representation-label mutual information. Unlike the information bottleneck, we handle the input-representation information in an opposite way to avoid the over-compression issue of latent representations. Besides, to mitigate the negative effect of potential redundant features from the input, we design a conditional information minimization principle to eliminate negative redundant features while preserve noise-invariant features. Experiments on 13 language understanding benchmarks demonstrate that our method effectively improves the performance of PLMs for classification and regression. Extensive experiments show that the learned representations are more sufficient, robust and transferable.
翻译:本文提出了一种信息论表示学习框架,称为条件信息流最大化,旨在为目标任务的输入数据提取噪声不变的充分表示。该框架促使学习到的表示具有良好的特征均匀性和充分的预测能力,从而增强预训练语言模型在目标任务上的泛化性能。首先,我们提出信息流最大化原则,通过同时最大化输入-表示与表示-标签的互信息,学习对输入和目标更充分的表示。与信息瓶颈方法不同,我们以相反的方式处理输入-表示信息,以避免潜在表示的过度压缩问题。此外,为减轻输入中潜在冗余特征的负面影响,我们设计了条件信息最小化原则,在保留噪声不变特征的同时消除负面冗余特征。在13个语言理解基准测试上的实验表明,本方法有效提升了预训练语言模型在分类和回归任务上的性能。大量实验证明,学习到的表示具有更充分的表征能力、更强的鲁棒性和可迁移性。