Developing value-aligned AI agents is a complex undertaking and an ongoing challenge in the field of AI. Specifically within the domain of Large Language Models (LLMs), the capability to consolidate multiple independently trained dialogue agents, each aligned with a distinct moral value, into a unified system that can adapt to and be aligned with multiple moral values is of paramount importance. In this paper, we propose a system that does contextual moral value alignment based on contextual aggregation. Here, aggregation is defined as the process of integrating a subset of LLM responses that are best suited to respond to a user input, taking into account features extracted from the user's input. The proposed system shows better results in term of alignment to human value compared to the state of the art.
翻译:开发价值对齐的人工智能代理是一项复杂的任务,也是人工智能领域持续存在的挑战。特别是在大语言模型(LLM)领域,将多个独立训练的、各自对齐不同道德价值的对话代理整合为一个统一系统,使其能够适应并与多种道德价值对齐,这一点至关重要。在本文中,我们提出了一种基于语境聚合实现上下文道德价值对齐的系统。其中,聚合被定义为从用户输入中提取特征后,整合最适合响应用户输入的一组LLM响应的过程。与现有技术相比,所提出的系统在与人类价值对齐方面表现出更好的效果。