Generative AI systems (ChatGPT, DALL-E, etc) are expanding into multiple areas of our lives, from art Rombach et al. [2021] to mental health Rob Morris and Kareem Kouddous [2022]; their rapidly growing societal impact opens new opportunities, but also raises ethical concerns. The emerging field of AI alignment aims to make AI systems reflect human values. This paper focuses on evaluating the ethics of multimodal AI systems involving both text and images - a relatively under-explored area, as most alignment work is currently focused on language models. We first create a multimodal ethical database from human feedback on ethicality. Then, using this database, we develop algorithms, including a RoBERTa-large classifier and a multilayer perceptron, to automatically assess the ethicality of system responses.
翻译:生成式人工智能系统(如ChatGPT、DALL-E等)正扩展至我们生活的多个领域,从艺术(Rombach等人,2021)到心理健康(Rob Morris和Kareem Kouddous,2022);其迅速增长的社会影响既带来新机遇,也引发伦理担忧。新兴的AI对齐领域旨在使AI系统反映人类价值观。本文聚焦于评估涉及文本和图像的多模态AI系统的伦理性——这是一个相对未被充分探索的领域,因为当前大多数对齐工作主要集中于语言模型。我们首先基于人类对伦理性的反馈构建了一个多模态伦理数据库。随后,利用该数据库,我们开发了包括RoBERTa-large分类器和多层感知机在内的算法,以自动评估系统响应的伦理性。