Traditional loss functions, including cross-entropy, contrastive, triplet, and su pervised contrastive losses, used for fine-tuning pre-trained language models such as BERT, operate only within local neighborhoods and fail to account for the global semantic structure. We present G-Loss, a graph-guided loss function that incorporates semi-supervised label propagation to use structural relationships within the embedding manifold. G-Loss builds a document-similarity graph that captures global semantic relationships, thereby guiding the model to learn more discriminative and robust embeddings. We evaluate G-Loss on five benchmark datasets covering key downstream classification tasks: MR (sentiment analysis), R8 and R52 (topic categorization), Ohsumed (medical document classification), and 20NG (news categorization). In the majority of experimental setups, G-Loss converges faster and produces semantically coherent embedding spaces, resulting in higher classification accuracy than models fine-tuned with traditional loss functions.
翻译:传统的损失函数,包括交叉熵损失、对比损失、三元组损失和监督对比损失,在用于微调BERT等预训练语言模型时,仅作用于局部邻域而未能考虑全局语义结构。我们提出G-Loss——一种图引导的损失函数,该函数融合半监督标签传播机制,利用嵌入流形中的结构关系。G-Loss通过构建文档相似性图来捕获全局语义关系,从而引导模型学习更具判别性和鲁棒性的嵌入表示。我们在涵盖关键下游分类任务的五个基准数据集上评估了G-Loss:MR(情感分析)、R8和R52(主题分类)、Ohsumed(医学文档分类)以及20NG(新闻分类)。在大部分实验设置中,G-Loss收敛速度更快,生成的嵌入空间具有语义连贯性,相比使用传统损失函数微调的模型,实现了更高的分类准确率。