The prevalence of half-truths, which are statements containing some truth but that are ultimately deceptive, has risen with the increasing use of the internet. To help combat this problem, we have created a comprehensive pipeline consisting of a half-truth detection model and a claim editing model. Our approach utilizes the T5 model for controlled claim editing; "controlled" here means precise adjustments to select parts of a claim. Our methodology achieves an average BLEU score of 0.88 (on a scale of 0-1) and a disinfo-debunk score of 85% on edited claims. Significantly, our T5-based approach outperforms other Language Models such as GPT2, RoBERTa, PEGASUS, and Tailor, with average improvements of 82%, 57%, 42%, and 23% in disinfo-debunk scores, respectively. By extending the LIAR PLUS dataset, we achieve an F1 score of 82% for the half-truth detection model, setting a new benchmark in the field. While previous attempts have been made at half-truth detection, our approach is, to the best of our knowledge, the first to attempt to debunk half-truths.
翻译:半真话(即包含部分真实内容但具有欺骗性的陈述)随着互联网的普及日益泛滥。为应对这一问题,我们构建了包含半真话检测模型与声明编辑模型的完整处理流程。本方法采用T5模型实现受控声明编辑——其中“受控”指对声明特定部分进行精确调整。经编辑的声明平均BLEU分数达0.88(0-1分制),虚假信息揭穿评分达85%。值得注意的是,基于T5的方法在虚假信息揭穿评分上较其他语言模型(如GPT2、RoBERTa、PEGASUS和Tailor)分别提升82%、57%、42%和23%。通过扩展LIAR PLUS数据集,半真话检测模型F1分数达82%,刷新了该领域基准。尽管此前已有半真话检测尝试,但据我们所知,本方法是首个致力于揭穿半真话的研究。