What if Deception Cannot be Detected? A Cross-Linguistic Study on the Limits of Deception Detection from Text

Can deception be detected solely from written text? Cues of deceptive communication are inherently subtle, even more so in text-only communication. Yet, prior studies have reported considerable success in automatic deception detection. We hypothesize that such findings are largely driven by artifacts introduced during data collection and do not generalize beyond specific datasets. We revisit this assumption by introducing a belief-based deception framework, which defines deception as a misalignment between an author's claims and true beliefs, irrespective of factual accuracy, allowing deception cues to be studied in isolation. Based on this framework, we construct three corpora, collectively referred to as DeFaBel, including a German-language corpus of deceptive and non-deceptive arguments and a multilingual version in German and English, each collected under varying conditions to account for belief change and enable cross-linguistic analysis. Using these corpora, we evaluate commonly reported linguistic cues of deception. Across all three DeFaBel variants, these cues show negligible, statistically insignificant correlations with deception labels, contrary to prior work that treats such cues as reliable indicators. We further benchmark against other English deception datasets following similar data collection protocols. While some show statistically significant correlations, effect sizes remain low and, critically, the set of predictive cues is inconsistent across datasets. We also evaluate deception detection using feature-based models, pretrained language models, and instruction-tuned large language models. While some models perform well on established deception datasets, they consistently perform near chance on DeFaBel. Our findings challenge the assumption that deception can be reliably inferred from linguistic cues and call for rethinking how deception is studied and modeled in NLP.

翻译：能否仅从书面文本中检测欺骗行为？欺骗性沟通的线索本质上是微妙的，在纯文本沟通中更是如此。然而，先前的研究在自动欺骗检测方面报告了相当大的成功。我们假设这些发现主要是由数据收集过程中引入的人为因素驱动的，并不能推广到特定数据集之外。我们通过引入一个基于信念的欺骗框架来重新审视这一假设，该框架将欺骗定义为作者的主张与其真实信念之间的错位，而不论事实准确性如何，从而允许孤立地研究欺骗线索。基于此框架，我们构建了三个语料库，统称为DeFaBel，包括一个包含欺骗性和非欺骗性论点的德语语料库，以及一个包含德语和英语的多语言版本，每个语料库均在变化条件下收集，以考虑信念变化并实现跨语言分析。利用这些语料库，我们评估了常被报告的欺骗性语言线索。在所有三个DeFaBel变体中，这些线索与欺骗标签显示出可忽略的、统计上不显著的相关性，这与先前将这些线索视为可靠指标的研究相反。我们进一步以遵循类似数据收集协议的其他英语欺骗数据集为基准进行对比。虽然其中一些显示出统计上显著的相关性，但效应量仍然很低，并且关键的是，预测性线索的集合在不同数据集之间并不一致。我们还使用基于特征的模型、预训练语言模型和指令微调的大型语言模型评估了欺骗检测。虽然一些模型在已建立的欺骗数据集上表现良好，但它们在DeFaBel上的表现始终接近随机水平。我们的研究结果挑战了欺骗可以从语言线索中可靠推断的假设，并呼吁重新思考如何在自然语言处理中研究和建模欺骗。