This paper presents a new supervised representation learning framework, namely Structured Probabilistic Coding (SPC), to learn compact and informative representations from input related to the target task. SPC is an encoder-only probabilistic coding technology with a structured regularization from the target label space. By extracting compact and informative representations from input related to the target task, SPC can enhance the generalization ability of pre-trained language models for better language understanding. Specifically, the hidden representation is encoded into a Gaussian distribution space, while maximizing the prior entropy of latent representations concerning label space. This technique can simultaneously perform information encoding and task prediction in one module to more fully utilize the effective information from input data, and use variational inference in the output space to reduce randomness and uncertainty. To better control the probability distribution in the latent space, a structured regularization is proposed to promote class-level uniformity in the latent space. With the regularization term, SPC can preserve the Gaussian distribution structure of latent code as well as better cover the hidden space with class uniformly. We conduct evaluations on 12 natural language understanding tasks. The results show that our SPC can effectively improve the performance of pre-trained language models for various classification and regression tasks. Experiments demonstrate that SPC can enhance the generalization capability, robustness to label noise, and clustering quality of output representations.
翻译:本文提出一种新的有监督表示学习框架——结构化概率编码(SPC),旨在从与目标任务相关的输入中学习紧凑且信息丰富的表示。SPC是一种仅含编码器的概率编码技术,通过目标标签空间的结构化正则化实现约束。通过从输入中提取与目标任务相关的紧凑信息表示,SPC可增强预训练语言模型的泛化能力,以提升语言理解性能。具体而言,该方法将隐层表示编码至高斯分布空间,同时最大化关于标签空间的隐表示先验熵。该技术可在同一模块中同时完成信息编码与任务预测,从而更充分地利用输入数据的有效信息,并通过输出空间的变分推断降低随机性与不确定性。为更好地控制隐空间的概率分布,本文提出一种结构化正则化方法,促进隐空间中的类别级均匀性。借助该正则项,SPC既能保持潜码的高斯分布结构,又能使隐空间被类别均匀覆盖。我们在12项自然语言理解任务上进行了评估,结果表明SPC能有效提升预训练语言模型在各种分类与回归任务中的性能。实验证明,SPC可增强输出表示的泛化能力、对标签噪声的鲁棒性及其聚类质量。