SEPSIS: I Can Catch Your Lies -- A New Paradigm for Deception Detection

Deception is the intentional practice of twisting information. It is a nuanced societal practice deeply intertwined with human societal evolution, characterized by a multitude of facets. This research explores the problem of deception through the lens of psychology, employing a framework that categorizes deception into three forms: lies of omission, lies of commission, and lies of influence. The primary focus of this study is specifically on investigating only lies of omission. We propose a novel framework for deception detection leveraging NLP techniques. We curated an annotated dataset of 876,784 samples by amalgamating a popular large-scale fake news dataset and scraped news headlines from the Twitter handle of Times of India, a well-known Indian news media house. Each sample has been labeled with four layers, namely: (i) the type of omission (speculation, bias, distortion, sounds factual, and opinion), (ii) colors of lies(black, white, etc), and (iii) the intention of such lies (to influence, etc) (iv) topic of lies (political, educational, religious, etc). We present a novel multi-task learning pipeline that leverages the dataless merging of fine-tuned language models to address the deception detection task mentioned earlier. Our proposed model achieved an F1 score of 0.87, demonstrating strong performance across all layers including the type, color, intent, and topic aspects of deceptive content. Finally, our research explores the relationship between lies of omission and propaganda techniques. To accomplish this, we conducted an in-depth analysis, uncovering compelling findings. For instance, our analysis revealed a significant correlation between loaded language and opinion, shedding light on their interconnectedness. To encourage further research in this field, we will be making the models and dataset available with the MIT License, making it favorable for open-source research.

翻译：欺骗是一种刻意扭曲信息的实践，它是与社会演化深度交织的多面性社会行为。本研究通过心理学视角探索欺骗问题，采用将欺骗分为三类（遗漏谎言、行为谎言和影响谎言）的框架，重点聚焦于遗漏谎言这一类型。我们提出了一种基于自然语言处理技术的欺骗检测新框架。通过整合一个大规模假新闻数据集与从印度知名新闻媒体《印度时报》推特账号抓取的新闻标题，我们构建了一个包含876,784个样本的标注数据集。每个样本标注了四层信息：（i）遗漏类型（推测、偏见、扭曲、看似事实和观点），（ii）谎言颜色（黑色、白色等），（iii）此类谎言的意图（影响等），以及（iv）谎言主题（政治、教育、宗教等）。我们提出了一种新颖的多任务学习流水线，通过无数据方式融合微调语言模型来处理前述欺骗检测任务。所提模型在F1分数上达到0.87，在欺骗内容的类型、颜色、意图和主题等所有层次均展现出强劲性能。最后，本研究探索了遗漏谎言与宣传技巧之间的关系。为此，我们进行了深入分析并发现了令人信服的结论——例如，分析揭示了情感倾向语言与观点之间的显著相关性。为促进该领域的进一步研究，我们将采用MIT许可证公开模型与数据集，以支持开源研究。