人工智能中的规范过拟合 (Specification Overfitting in Artificial Intelligence)

Machine learning (ML) and artificial intelligence (AI) approaches are often criticized for their inherent bias and for their lack of control, accountability, and transparency. Consequently, regulatory bodies struggle with containing this technology's potential negative side effects. High-level requirements such as fairness and robustness need to be formalized into concrete specification metrics, imperfect proxies that capture isolated aspects of the underlying requirements. Given possible trade-offs between different metrics and their vulnerability to over-optimization, integrating specification metrics in system development processes is not trivial. This paper defines specification overfitting, a scenario where systems focus excessively on specified metrics to the detriment of high-level requirements and task performance. We present an extensive literature survey to categorize how researchers propose, measure, and optimize specification metrics in several AI fields (e.g., natural language processing, computer vision, reinforcement learning). Using a keyword-based search on papers from major AI conferences and journals between 2018 and mid-2023, we identify and analyze 74 papers that propose or optimize specification metrics. We find that although most papers implicitly address specification overfitting (e.g., by reporting more than one specification metric), they rarely discuss which role specification metrics should play in system development or explicitly define the scope and assumptions behind metric formulations.

翻译：机器学习（ML）与人工智能（AI）方法常因其固有的偏见以及缺乏可控性、问责制和透明度而受到批评。因此，监管机构难以遏制该技术潜在的负面影响。诸如公平性和鲁棒性等高层次需求，需要被形式化为具体的规范度量——这些度量是捕捉底层需求孤立方面的、不完美的代理指标。考虑到不同度量之间可能存在的权衡关系及其对过度优化的脆弱性，将规范度量整合到系统开发过程中并非易事。本文定义了“规范过拟合”，即系统过度专注于指定的度量，从而损害高层次需求和任务性能的情形。我们通过广泛的文献调研，对研究人员在多个AI领域（例如自然语言处理、计算机视觉、强化学习）中如何提出、衡量和优化规范度量进行了分类。基于对2018年至2023年中期间主要AI会议和期刊论文的关键词检索，我们识别并分析了74篇提出或优化规范度量的论文。我们发现，尽管大多数论文都隐含地涉及了规范过拟合问题（例如，通过报告不止一种规范度量），但它们很少讨论规范度量应在系统开发中扮演何种角色，也极少明确界定度量公式背后的范围和假设。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日