Probabilistic embeddings have several advantages over deterministic embeddings as they map each data point to a distribution, which better describes the uncertainty and complexity of data. Many works focus on adjusting the distribution constraint under the Information Bottleneck (IB) principle to enhance representation learning. However, these proposed regularization terms only consider the constraint of each latent variable, omitting the structural information between latent variables. In this paper, we propose a novel structural entropy-guided probabilistic coding model, named SEPC. Specifically, we incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss. Besides, as traditional structural information theory is not well-suited for regression tasks, we propose a probabilistic encoding tree, transferring regression tasks to classification tasks while diminishing the influence of the transformation. Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC compared to other state-of-the-art models in terms of effectiveness, generalization capability, and robustness to label noise. The codes and datasets are available at https://github.com/SELGroup/SEPC.
翻译:概率嵌入相较于确定性嵌入具有多项优势,其将每个数据点映射为分布,从而更好地描述数据的不确定性与复杂性。许多研究基于信息瓶颈原理调整分布约束以增强表示学习。然而,这些提出的正则化项仅考虑各隐变量的独立约束,忽略了隐变量间的结构信息。本文提出一种新颖的结构熵引导的概率编码模型,命名为SEPC。具体而言,我们通过提出结构熵正则化损失,将隐变量间的关系纳入优化过程。此外,由于传统结构信息理论不适用于回归任务,我们提出概率编码树,将回归任务转化为分类任务,同时降低转换过程的影响。在涵盖分类与回归任务的12项自然语言理解任务上的实验结果表明,SEPC在有效性、泛化能力和对标签噪声的鲁棒性方面均优于其他先进模型。代码与数据集已公开于https://github.com/SELGroup/SEPC。