评估验证与负责任人工智能的交汇点 (Where Assessment Validation and Responsible AI Meet)

Validity, reliability, and fairness are core ethical principles embedded in classical argument-based assessment validation theory. These principles are also central to the Standards for Educational and Psychological Testing (2014) which recommended best practices for early applications of artificial intelligence (AI) in high-stakes assessments for automated scoring of written and spoken responses. Responsible AI (RAI) principles and practices set forth by the AI ethics community are critical to ensure the ethical use of AI across various industry domains. Advances in generative AI have led to new policies as well as guidance about the implementation of RAI principles for assessments using AI. Building on Chapelle's foundational validity argument work to address the application of assessment validation theory for technology-based assessment, we propose a unified assessment framework that considers classical test validation theory and assessment-specific and domain-agnostic RAI principles and practice. The framework addresses responsible AI use for assessment that supports validity arguments, alignment with AI ethics to maintain human values and oversight, and broader social responsibility associated with AI use.

翻译：效度、信度与公平性是经典论证式评估验证理论所蕴含的核心伦理原则。这些原则同样是《教育与心理测试标准》（2014版）的核心要义，该标准为人工智能在高风险评估中早期应用于书面及口语作答的自动评分提供了最佳实践指南。人工智能伦理界提出的负责任人工智能原则与实践，对于确保人工智能在各行业领域的伦理应用至关重要。生成式人工智能的发展催生了针对人工智能评估中负责任人工智能原则实施的新政策与指导方针。基于Chapelle关于技术化评估中验证理论应用的基础性效度论证研究，我们提出一个融合经典测试验证理论与评估领域特定及领域无关的负责任人工智能原则及实践的统一评估框架。该框架旨在构建支持效度论证的负责任人工智能评估体系，通过符合人工智能伦理以维护人类价值与监督机制，并承担人工智能应用所关联的更广泛社会责任。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日