The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of these models remain pressing concerns. To this end, we investigate iterative prompting, a strategy hypothesized to refine LLM responses, assessing its impact on LLM truthfulness, an area which has not been thoroughly explored. Our extensive experiments delve into the intricacies of iterative prompting variants, examining their influence on the accuracy and calibration of model responses. Our findings reveal that naive prompting methods significantly undermine truthfulness, leading to exacerbated calibration errors. In response to these challenges, we introduce several prompting variants designed to address the identified issues. These variants demonstrate marked improvements over existing baselines, signaling a promising direction for future research. Our work provides a nuanced understanding of iterative prompting and introduces novel approaches to enhance the truthfulness of LLMs, thereby contributing to the development of more accurate and trustworthy AI systems.
翻译:大语言模型(LLM)的发展已经显著改变了众多领域,展现出令人印象深刻的文本生成能力。然而,这些模型的可靠性和真实性问题仍然是紧迫的关切。为此,我们研究了迭代提示——一种被假设能够优化LLM响应的策略——评估其对LLM真实性的影响,这一领域尚未得到充分探索。我们通过广泛的实验深入探讨了迭代提示变体的复杂性,考察了它们对模型响应准确性和校准性的影响。我们的发现表明,朴素提示方法会显著损害真实性,导致校准误差加剧。针对这些挑战,我们提出了几种旨在解决已识别问题的提示变体。这些变体在现有基线方法上展现出显著改进,为未来的研究指明了有前景的方向。我们的工作提供了对迭代提示的细致理解,并引入新方法以增强LLM的真实性,从而为开发更准确和可信赖的AI系统做出贡献。