Honesty is a fundamental principle for aligning large language models (LLMs) with human values, requiring these models to recognize what they know and don't know and be able to faithfully express their knowledge. Despite promising, current LLMs still exhibit significant dishonest behaviors, such as confidently presenting wrong answers or failing to express what they know. In addition, research on the honesty of LLMs also faces challenges, including varying definitions of honesty, difficulties in distinguishing between known and unknown knowledge, and a lack of comprehensive understanding of related research. To address these issues, we provide a survey on the honesty of LLMs, covering its clarification, evaluation approaches, and strategies for improvement. Moreover, we offer insights for future research, aiming to inspire further exploration in this important area.
翻译:诚实性是使大型语言模型(LLMs)与人类价值观对齐的基本原则,要求模型能够识别其已知与未知的知识,并能够忠实地表达其认知。尽管前景广阔,当前的大型语言模型仍表现出显著的不诚实行为,例如自信地提供错误答案或未能表达其已知信息。此外,关于大型语言模型诚实性的研究也面临诸多挑战,包括诚实性定义的不统一、已知与未知知识区分的困难,以及对相关研究缺乏系统性梳理。为应对这些问题,本文对大型语言模型的诚实性研究进行了综述,涵盖其概念界定、评估方法及提升策略。此外,我们为未来研究提供了方向性见解,旨在推动这一重要领域的进一步探索。