CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses

The rapid progress in Large Language Models (LLMs) poses potential risks such as generating unethical content. Assessing LLMs' values can help expose their misalignment, but relies on reference-free evaluators, e.g., fine-tuned LLMs or close-source ones like GPT-4, to identify values reflected in generated responses. Nevertheless, these evaluators face two challenges in open-ended value evaluation: they should align with changing human value definitions with minimal annotation, against their own bias (adaptability), and detect varying value expressions and scenarios robustly (generalizability). To handle these challenges, we introduce CLAVE, a novel framework which integrates two complementary LLMs, a large one to extract high-level value concepts from a few human labels, leveraging its extensive knowledge and generalizability, and a smaller one fine-tuned on such concepts to better align with human value understanding. This dual-model approach enables calibration with any value systems using <100 human-labeled samples per value type. Then we present ValEval, a comprehensive dataset comprising 13k+ (text,value,label) tuples across diverse domains, covering three major value systems. We benchmark the capabilities of 12+ popular LLM evaluators and analyze their strengths and weaknesses. Our findings reveal that combining fine-tuned small models and prompt-based large ones serves as a superior balance in value evaluation.

翻译：大语言模型（LLM）的快速发展带来了生成不道德内容等潜在风险。评估LLM的价值取向有助于揭示其偏差，但这依赖于无参考评估器（例如微调后的LLM或GPT-4等闭源模型）来识别生成响应中反映的价值观。然而，这些评估器在开放式价值评估中面临两大挑战：它们需要以最少标注量适应动态变化的人类价值定义并克服自身偏见（适应性），同时还需稳健地检测多样化的价值表达与场景（泛化性）。为应对这些挑战，我们提出了CLAVE——一个集成双互补LLM的创新框架。该框架利用大型LLM从少量人工标注中提取高层次价值概念，充分发挥其广博知识储备与泛化能力；同时使用基于此类概念微调的小型LLM以更好地契合人类价值认知。这种双模型架构仅需每种价值类型不足100个人工标注样本，即可实现与任意价值体系的校准。此外，我们构建了ValEval数据集，涵盖三大主流价值体系，包含跨领域13,000余组（文本,价值,标签）三元组。我们对12种以上主流LLM评估器进行了基准测试，系统分析了其优劣特性。研究发现，结合微调小型模型与基于提示的大型模型，能在价值评估中实现更优的平衡。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日