HonestLLM: Toward an Honest and Helpful Large Language Model

Large Language Models (LLMs) have achieved remarkable success across various industries due to their exceptional generative capabilities. However, for safe and effective real-world deployments, ensuring honesty and helpfulness is critical. This paper addresses the question: Can we prioritize the helpfulness of LLMs while preserving their honesty? To begin with, we establish exhaustive principles aimed at guaranteeing the honesty of LLM. Additionally, we introduce a novel dataset, referred to as HoneSet, comprising 930 queries spanning six categories meticulously crafted to assess an LLM's capacity for maintaining honesty. Subsequently, we present two approaches to augmenting honesty and helpfulness in LLMs: a training-free enhancement and a fine-tuning-based improvement. The training-free approach, which is based on curiosity-driven prompting, empowers LLMs to articulate internal confusion and uncertainty regarding queries, thereby optimizing their responses. Conversely, the fine-tuning-based method employs a two-stage process inspired by curriculum learning: initially instructing LLMs to discern between honest and dishonest responses, then refining their training to enhance helpfulness. Experiments conducted on nine prominent LLMs demonstrate a significant improvement in alignment with honesty across all models through the implementation of our proposed enhancements. Particularly noteworthy is the 65.3% enhancement observed in Llama3-8b and the remarkable 124.7% improvement in Mistral-7b, as measured by the H$^{2}$ (honest and helpful) assessment. We believe that our work can pave the way for developing more trustworthy LLMs for real-world applications.

翻译：大型语言模型（LLM）凭借其卓越的生成能力，在各行业取得了显著成功。然而，为确保其在现实世界中安全有效地部署，保证其诚实性与有益性至关重要。本文探讨了以下问题：我们能否在保持LLM诚实性的同时，优先提升其有益性？首先，我们建立了一套详尽的准则，旨在保障LLM的诚实性。此外，我们引入了一个新颖的数据集，称为HoneSet，包含涵盖六个类别的930个查询，这些查询经过精心设计，用于评估LLM保持诚实性的能力。随后，我们提出了两种增强LLM诚实性与有益性的方法：一种无需训练的增强方法，以及一种基于微调的改进方法。无需训练的方法基于好奇心驱动的提示技术，使LLM能够针对查询表达内部的困惑与不确定性，从而优化其回应。相反，基于微调的方法采用受课程学习启发的两阶段流程：首先指导LLM区分诚实与不诚实的回应，随后通过训练精炼以提升其有益性。在九个主流LLM上进行的实验表明，通过实施我们提出的增强方案，所有模型在诚实性对齐方面均取得了显著提升。尤其值得注意的是，根据H$^{2}$（诚实与有益）评估，Llama3-8b的评分提升了65.3%，而Mistral-7b更是实现了124.7%的显著改进。我们相信，这项工作能为开发更值得信赖、适用于现实应用的LLM铺平道路。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日