RAISE：负责任人工智能评分与评估的统一框架 (RAISE: A Unified Framework for Responsible AI Scoring and Evaluation)

As AI systems enter high-stakes domains, evaluation must extend beyond predictive accuracy to include explainability, fairness, robustness, and sustainability. We introduce RAISE (Responsible AI Scoring and Evaluation), a unified framework that quantifies model performance across these four dimensions and aggregates them into a single, holistic Responsibility Score. We evaluated three deep learning models: a Multilayer Perceptron (MLP), a Tabular ResNet, and a Feature Tokenizer Transformer, on structured datasets from finance, healthcare, and socioeconomics. Our findings reveal critical trade-offs: the MLP demonstrated strong sustainability and robustness, the Transformer excelled in explainability and fairness at a very high environmental cost, and the Tabular ResNet offered a balanced profile. These results underscore that no single model dominates across all responsibility criteria, highlighting the necessity of multi-dimensional evaluation for responsible model selection. Our implementation is available at: https://github.com/raise-framework/raise.

翻译：随着人工智能系统进入高风险领域，评估必须超越预测准确性，涵盖可解释性、公平性、鲁棒性和可持续性。我们提出了RAISE（负责任人工智能评分与评估）这一统一框架，该框架量化模型在上述四个维度的表现，并将其聚合为单一的整体责任评分。我们在金融、医疗保健和社会经济学领域的结构化数据集上评估了三种深度学习模型：多层感知机（MLP）、表格残差网络（Tabular ResNet）和特征标记化Transformer（Feature Tokenizer Transformer）。研究结果揭示了关键权衡：MLP表现出优异的可持续性和鲁棒性，Transformer在可解释性和公平性方面表现卓越但环境代价极高，而表格残差网络则提供了均衡的性能特征。这些结果表明，没有任何单一模型能在所有责任标准上占优，凸显了负责任模型选择中多维评估的必要性。我们的实现代码发布于：https://github.com/raise-framework/raise。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日