The lack of contextual information in text data can make the annotation process of text-based emotion classification datasets challenging. As a result, such datasets often contain labels that fail to consider all the relevant emotions in the vocabulary. This misalignment between text inputs and labels can degrade the performance of machine learning models trained on top of them. As re-annotating entire datasets is a costly and time-consuming task that cannot be done at scale, we propose to use the expressive capabilities of large language models to synthesize additional context for input text to increase its alignment with the annotated emotional labels. In this work, we propose a formal definition of textual context to motivate a prompting strategy to enhance such contextual information. We provide both human and empirical evaluation to demonstrate the efficacy of the enhanced context. Our method improves alignment between inputs and their human-annotated labels from both an empirical and human-evaluated standpoint.
翻译:文本数据中缺乏上下文信息,使得基于文本的情感分类数据集的标注过程具有挑战性。因此,此类数据集常包含未能考虑词汇中所有相关情感的标签。文本输入与标签之间的这种错位会降低基于这些数据训练的机器学习模型的性能。鉴于重新标注整个数据集是一项成本高昂且耗时、无法大规模实施的任务,我们提出利用大语言模型的表达能力,为输入文本合成额外上下文,以增强其与标注情感标签的一致性。在此工作中,我们提出了文本上下文的正式定义,以激发一种提示策略来增强此类上下文信息。我们通过人类评估和实证评估,展示了增强上下文的有效性。从实证和人类评估的角度来看,我们的方法改进了输入与其人工标注标签之间的对齐。