Validity, reliability, and fairness are core ethical principles embedded in classical argument-based assessment validation theory. These principles are also central to the Standards for Educational and Psychological Testing (2014) which recommended best practices for early applications of artificial intelligence (AI) in high-stakes assessments for automated scoring of written and spoken responses. Responsible AI (RAI) principles and practices set forth by the AI ethics community are critical to ensure the ethical use of AI across various industry domains. Advances in generative AI have led to new policies as well as guidance about the implementation of RAI principles for assessments using AI. Building on Chapelle's foundational validity argument work to address the application of assessment validation theory for technology-based assessment, we propose a unified assessment framework that considers classical test validation theory and assessment-specific and domain-agnostic RAI principles and practice. The framework addresses responsible AI use for assessment that supports validity arguments, alignment with AI ethics to maintain human values and oversight, and broader social responsibility associated with AI use.
翻译:效度、信度与公平性是经典论证式评估验证理论所蕴含的核心伦理原则。这些原则同样是《教育与心理测试标准》(2014版)的核心要义,该标准为人工智能在高风险评估中早期应用于书面及口语作答的自动评分提供了最佳实践指南。人工智能伦理界提出的负责任人工智能原则与实践,对于确保人工智能在各行业领域的伦理应用至关重要。生成式人工智能的发展催生了针对人工智能评估中负责任人工智能原则实施的新政策与指导方针。基于Chapelle关于技术化评估中验证理论应用的基础性效度论证研究,我们提出一个融合经典测试验证理论与评估领域特定及领域无关的负责任人工智能原则及实践的统一评估框架。该框架旨在构建支持效度论证的负责任人工智能评估体系,通过符合人工智能伦理以维护人类价值与监督机制,并承担人工智能应用所关联的更广泛社会责任。