LLMs, while outperforming humans in a wide range of tasks, can still fail in unanticipated ways. We focus on two pervasive failure modes: (i) hallucinations, where models produce incorrect information about the world, and (ii) the low-resource effect, where the models show impressive performance in high-resource languages like English but the performance degrades significantly in low-resource languages like Bengali. We study the intersection of these issues and ask: do hallucination detectors suffer from the low-resource effect? We conduct experiments on five tasks across three domains (factual recall, STEM, and Humanities). Experiments with four LLMs and three hallucination detectors reveal a curious finding: As expected, the task accuracies in low-resource languages experience large drops (compared to English). However, the drop in detectors' accuracy is often several times smaller than the drop in task accuracy. Our findings suggest that even in low-resource languages, the internal mechanisms of LLMs might encode signals about their uncertainty. Further, the detectors are robust within language (even for non-English) and in multilingual setups, but not in cross-lingual settings without in-language supervision.
翻译:大语言模型虽然在众多任务中表现优于人类,但仍可能以意想不到的方式失败。我们聚焦于两种普遍存在的失效模式:(i) 幻觉——模型生成关于世界的错误信息;(ii) 低资源效应——模型在英语等高资源语言中表现出色,但在孟加拉语等低资源语言中性能显著下降。我们研究这两个问题的交集并提出:幻觉检测器是否也受低资源效应影响?我们在三个领域(事实回忆、STEM 和人文学科)的五个任务上开展实验。通过对四种大语言模型和三种幻觉检测器的实验,我们得出了一个耐人寻味的发现:正如预期,低资源语言的任务准确率相比英语出现大幅下降。然而,检测器准确率的下降幅度通常比任务准确率的下降幅度小数倍。我们的研究结果表明,即使在低资源语言中,大语言模型的内部机制可能仍编码了其不确定性的信号。此外,检测器在单语言环境(包括非英语)和多语言设置中表现稳健,但在缺乏目标语言监督的跨语言场景中则不然。