Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Study

In Artificial Intelligence (AI), language models have gained significant importance due to the widespread adoption of systems capable of simulating realistic conversations with humans through text generation. Because of their impact on society, developing and deploying these language models must be done responsibly, with attention to their negative impacts and possible harms. In this scenario, the number of AI Ethics Tools (AIETs) publications has recently increased. These AIETs are designed to help developers, companies, governments, and other stakeholders establish trust, transparency, and responsibility with their technologies by bringing accepted values to guide AI's design, development, and use stages. However, many AIETs lack good documentation, examples of use, and proof of their effectiveness in practice. This paper presents a methodology for evaluating AIETs in language models. Our approach involved an extensive literature survey on 213 AIETs, and after applying inclusion and exclusion criteria, we selected four AIETs: Model Cards, ALTAI, FactSheets, and Harms Modeling. For evaluation, we applied AIETs to language models developed for the Portuguese language, conducting 35 hours of interviews with their developers. The evaluation considered the developers' perspective on the AIETs' use and quality in helping to identify ethical considerations about their model. The results suggest that the applied AIETs serve as a guide for formulating general ethical considerations about language models. However, we note that they do not address unique aspects of these models, such as idiomatic expressions. Additionally, these AIETs did not help to identify potential negative impacts of models for the Portuguese language.

翻译：在人工智能（AI）领域，由于能够通过文本生成模拟人类真实对话的系统被广泛采用，语言模型已变得至关重要。鉴于其对社会的深远影响，开发和部署这些语言模型必须秉持负责任的态度，并关注其负面效应和潜在危害。在此背景下，AI伦理工具（AIETs）相关出版物数量近期有所增长。这些AIETs旨在帮助开发者、企业、政府及其他利益相关方，通过引入公认的价值观来指导AI的设计、开发和使用阶段，从而建立对其技术的信任、透明度和责任。然而，许多AIETs缺乏完善的文档、使用案例以及实践有效性的证据。本文提出了一种评估语言模型中AIETs的方法论。我们的方法涉及对213种AIETs进行广泛的文献调研，并依据纳入与排除标准筛选出四种AIETs：模型卡片、ALTAI、事实表和危害建模。为进行评估，我们将这些AIETs应用于针对葡萄牙语开发的语言模型，并对其开发者进行了共计35小时的访谈。评估从开发者视角考量了AIETs在辅助识别其模型伦理考量方面的使用效果与质量。结果表明，所应用的AIETs可作为制定语言模型一般伦理考量的指南。然而，我们注意到它们未能解决这类模型的独特方面，例如惯用表达。此外，这些AIETs未能帮助识别模型对葡萄牙语可能产生的负面影响。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

LLMS4ALL：大语言模型在各学科科研与应用中的综述

专知会员服务

36+阅读 · 2025年10月4日

从语言到行动：大语言模型作为自主智能体与工具使用者的综述

专知会员服务

29+阅读 · 2025年9月2日

人工智能伦理风险与治理研究

专知会员服务

20+阅读 · 2025年4月22日

【AI4Science】利用大型语言模型变革科学：关于人工智能辅助科学发现、实验、内容生成与评估的调研

专知会员服务

33+阅读 · 2025年2月10日