The impact of artificial intelligence systems on our society is increasing at an unprecedented speed. For instance, ChatGPT is being tested in mental health treatment applications such as Koko, Stable Diffusion generates pieces of art competitive with (or outperforming) human artists, and so on. Ethical concerns regarding the behavior and applications of generative AI systems have been increasing over the past years, and the field of AI alignment - steering the behavior of AI systems towards being aligned with human values - is a rapidly growing subfield of modern AI. In this paper, we address the challenges involved in ethical evaluation of a multimodal artificial intelligence system. The multimodal systems we focus on take both text and an image as input and output text, completing the sentence or answering the question asked as input. We perform the evaluation of these models in two steps: we first discus the creation of a multimodal ethical database and then use this database to construct morality-evaluating algorithms. The creation of the multimodal ethical database is done interactively through human feedback. Users are presented with multiple examples and votes on whether they are ethical or not. Once these answers have been aggregated into a dataset, we built and tested different algorithms to automatically evaluate the morality of multimodal systems. These algorithms aim to classify the answers as ethical or not. The models we tested are a RoBERTa-large classifier and a multilayer perceptron classifier.
翻译:人工智能系统对社会的影响正以前所未有的速度增长。例如,ChatGPT正在被用于Koko等心理健康治疗应用中进行测试,Stable Diffusion生成的艺术作品能够与人类艺术家相媲美甚至超越后者,诸如此类。近年来,关于生成式人工智能系统行为及其应用的伦理关切日益增加,而AI对齐领域——即引导AI系统行为使其与人类价值观保持一致——已成为现代人工智能中快速发展的子领域。本文聚焦于多模态人工智能系统伦理评估中面临的挑战。我们研究的这类多模态系统以文本和图像为输入,输出文本,完成输入所要求的句子或回答问题。我们对这些模型的评估分两步进行:首先讨论多模态伦理数据库的构建,随后利用该数据库开发道德评判算法。多模态伦理数据库的创建通过人类反馈以交互方式完成:用户需对多个示例投票,判断其是否符合伦理。当这些答案被整合成数据集后,我们构建并测试了不同的算法来自动评估多模态系统的道德性。这些算法旨在将回答分类为符合伦理或不符合伦理。我们测试的模型包括RoBERTa-large分类器与多层感知机分类器。