This paper presents a state-of-the-art solution to the LongEval CLEF 2023 Lab Task 2: LongEval-Classification. The goal of this task is to improve and preserve the performance of sentiment analysis models across shorter and longer time periods. Our framework feeds date-prefixed textual inputs to a pre-trained language model, where the timestamp is included in the text. We show date-prefixed samples better conditions model outputs on the temporal context of the respective texts. Moreover, we further boost performance by performing self-labeling on unlabeled data to train a student model. We augment the self-labeling process using a novel augmentation strategy leveraging the date-prefixed formatting of our samples. We demonstrate concrete performance gains on the LongEval-Classification evaluation set over non-augmented self-labeling. Our framework achieves a 2nd place ranking with an overall score of 0.6923 and reports the best Relative Performance Drop (RPD) of -0.0656 over the short evaluation set.
翻译:本文提出了一种针对LongEval CLEF 2023实验室任务2:LongEval-分类的最新解决方案。该任务的目标是在较短和较长的时间跨度内,提升并保持情感分析模型的性能。我们的框架向预训练语言模型输入带有日期前缀的文本,其中时间戳被嵌入文本中。我们展示了带有日期前缀的样本能够更好地基于各自文本的时间上下文条件化模型输出。此外,我们通过对未标注数据执行自标注来训练学生模型,进一步提升了性能。我们利用一种基于样本日期前缀格式的新型增强策略,对自标注过程进行了扩展。我们证明了在LongEval-分类评估集上,相对于未增强的自标注方法,该方法带来了具体的性能提升。我们的框架以0.6923的总分排名第二,并在短期评估集上报告了最佳的相对性能下降值(RPD)为-0.0656。