The use of Large Language Models (LLMs) has increased significantly recently, with individuals frequently interacting with chatbots to receive answers to a wide range of questions. In an era where information is readily accessible, it is crucial to stimulate and preserve human cognitive abilities and maintain strong reasoning skills. This paper addresses such challenges by promoting the use of hints as an alternative or a supplement to direct answers. We first introduce a manually constructed hint dataset, WIKIHINT, which includes 5,000 hints created for 1,000 questions. We then finetune open-source LLMs such as LLaMA-3.1 for hint generation in answer-aware and answer-agnostic contexts. We assess the effectiveness of the hints with human participants who try to answer questions with and without the aid of hints. Additionally, we introduce a lightweight evaluation method, HINTRANK, to evaluate and rank hints in both answer-aware and answer-agnostic settings. Our findings show that (a) the dataset helps generate more effective hints, (b) including answer information along with questions generally improves hint quality, and (c) encoder-based models perform better than decoder-based models in hint ranking.
翻译:近年来,大语言模型(LLMs)的使用显著增加,人们经常通过与聊天机器人交互来获取各类问题的答案。在信息唾手可得的时代,激发并保持人类的认知能力、维持强大的推理技能至关重要。本文通过倡导使用提示作为直接答案的替代或补充,以应对此类挑战。我们首先引入了一个人工构建的提示数据集 WIKIHINT,该数据集包含为 1000 个问题创建的 5000 条提示。随后,我们对 LLaMA-3.1 等开源大语言模型进行微调,使其能够在答案已知和答案未知的语境下生成提示。我们通过让人类参与者在有提示和无提示辅助的情况下尝试回答问题,来评估提示的有效性。此外,我们引入了一种轻量级的评估方法 HINTRANK,用于在答案已知和答案未知两种设置下评估并排序提示。我们的研究结果表明:(a)该数据集有助于生成更有效的提示;(b)在问题中包含答案信息通常能提升提示质量;(c)在提示排序任务中,基于编码器的模型表现优于基于解码器的模型。