BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

In this paper, we introduce the \textsc{BeaverTails} dataset, aimed at fostering research on safety alignment in large language models (LLMs). This dataset uniquely separates annotations of helpfulness and harmlessness for question-answering pairs, thus offering distinct perspectives on these crucial attributes. In total, we have gathered safety meta-labels for 30,207 question-answer (QA) pairs and 30,144 pairs of expert comparison data for both the helpfulness and harmlessness metrics. In total, we have gathered safety meta-labels for 333,963 question-answer (QA) pairs and 361,903 pairs of expert comparison data for both the helpfulness and harmlessness metrics. We further showcase applications of BeaverTails in content moderation and reinforcement learning with human feedback (RLHF), emphasizing its potential for practical safety measures in LLMs. We believe this dataset provides vital resources for the community, contributing towards the safe development and deployment of LLMs. Our project page is available at the following URL: https://sites.google.com/view/pku-beavertails. Warning: this paper contains example data that may be offensive or harmful.

翻译：本文介绍了BeaverTails数据集，旨在推动大语言模型安全对齐研究。该数据集独特地将问答对的有用性与无害性标注分离，从而为这两个关键属性提供了不同的视角。我们总共收集了30,207个问答对的安全元标签，以及针对有用性和无害性指标的30,144对专家比较数据。实际上，我们共收集了333,963个问答对的安全元标签与361,903对针对这两项指标的专家比较数据。我们进一步展示了BeaverTails在内容审核及基于人类反馈的强化学习中的应用，凸显其对大语言模型实际安全措施的潜力。我们相信该数据集将为学术界提供重要资源，助力大语言模型的安全开发与部署。项目页面可访问：https://sites.google.com/view/pku-beavertails。注意：本文包含可能令人不适或有害的示例数据。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日