Large Language Models (LLMs) have shown great potential in Natural Language Processing (NLP) tasks. However, recent literature reveals that LLMs generate nonfactual responses intermittently, which impedes the LLMs' reliability for further utilization. In this paper, we propose a novel self-detection method to detect which questions that a LLM does not know that are prone to generate nonfactual results. Specifically, we first diversify the textual expressions for a given question and collect the corresponding answers. Then we examine the divergencies between the generated answers to identify the questions that the model may generate falsehoods. All of the above steps can be accomplished by prompting the LLMs themselves without referring to any other external resources. We conduct comprehensive experiments and demonstrate the effectiveness of our method on recently released LLMs, e.g., Vicuna, ChatGPT, and GPT-4.
翻译:大型语言模型在自然语言处理任务中展现出巨大潜力。然而,近期文献揭示这些模型会间歇性地生成非事实性回答,这阻碍了其进一步应用的可靠性。本文提出一种新颖的自我检测方法,用于识别大型语言模型可能因未知而倾向于产生非事实性结果的问题。具体而言,我们首先对给定问题进行多样化文本表达并收集对应的回答,随后通过检测生成答案之间的差异来识别模型可能产生虚假信息的问题。上述所有步骤均可通过直接提示大型语言模型自身完成,无需借助任何外部资源。我们通过全面的实验验证了该方法在近期发布的大型语言模型(如Vicuna、ChatGPT和GPT-4)上的有效性。