This paper presents our submission to Task 1, Subjectivity Detection, of the CheckThat! Lab at CLEF 2025. We investigate the effectiveness of transfer-learning and stylistic data augmentation to improve classification of subjective and objective sentences in English news text. Our approach contrasts fine-tuning of pre-trained encoders and transfer-learning of fine-tuned transformer on related tasks. We also introduce a controlled augmentation pipeline using GPT-4o to generate paraphrases in predefined subjectivity styles. To ensure label and style consistency, we employ the same model to correct and refine the generated samples. Results show that transfer-learning of specified encoders outperforms fine-tuning general-purpose ones, and that carefully curated augmentation significantly enhances model robustness, especially in detecting subjective content. Our official submission placed us $16^{th}$ of 24 participants. Overall, our findings underscore the value of combining encoder specialization with label-consistent augmentation for improved subjectivity detection. Our code is available at https://github.com/dsgt-arc/checkthat-2025-subject.
翻译:本文介绍了我们提交给CLEF 2025 CheckThat! Lab任务1(主观性检测)的工作。我们研究了迁移学习和风格化数据增强在改进英文新闻文本中主观句与客观句分类方面的有效性。我们的方法对比了预训练编码器的微调与在相关任务上微调后的Transformer模型的迁移学习。我们还引入了一个使用GPT-4o的受控增强流程,以生成具有预定义主观性风格的释义句。为确保标签和风格的一致性,我们采用同一模型对生成的样本进行校正和优化。结果表明,特定编码器的迁移学习优于通用编码器的微调,且精心策划的数据增强显著提升了模型的鲁棒性,尤其是在检测主观内容方面。我们的官方提交在24支参赛队伍中排名第16位。总体而言,我们的研究结果强调了将编码器专业化与标签一致性增强相结合对于改进主观性检测的价值。我们的代码可在 https://github.com/dsgt-arc/checkthat-2025-subject 获取。