AI alignment refers to models acting towards human-intended goals, preferences, or ethical principles. Given that most large-scale deep learning models act as black boxes and cannot be manually controlled, analyzing the similarity between models and humans can be a proxy measure for ensuring AI safety. In this paper, we focus on the models' visual perception alignment with humans, further referred to as AI-human visual alignment. Specifically, we propose a new dataset for measuring AI-human visual alignment in terms of image classification, a fundamental task in machine perception. In order to evaluate AI-human visual alignment, a dataset should encompass samples with various scenarios that may arise in the real world and have gold human perception labels. Our dataset consists of three groups of samples, namely Must-Act (i.e., Must-Classify), Must-Abstain, and Uncertain, based on the quantity and clarity of visual information in an image and further divided into eight categories. All samples have a gold human perception label; even Uncertain (severely blurry) sample labels were obtained via crowd-sourcing. The validity of our dataset is verified by sampling theory, statistical theories related to survey design, and experts in the related fields. Using our dataset, we analyze the visual alignment and reliability of five popular visual perception models and seven abstention methods. Our code and data is available at https://github.com/jiyounglee-0523/VisAlign.
翻译:人工智能对齐指的是模型按照人类预期的目标、偏好或伦理原则行事。鉴于大多数大规模深度学习模型是黑箱且无法手动控制,分析模型与人类之间的相似性可作为确保AI安全的代理度量。本文聚焦于模型与人类在视觉感知上的对齐,即“人工智能-人类视觉对齐”。具体而言,我们提出了一个新数据集,用于衡量图像分类(机器感知中的基础任务)中的人工智能-人类视觉对齐。为评估该对齐,数据集需涵盖真实世界可能出现的多样化场景样本,并具备人类感知真值标签。基于图像中视觉信息的数量和清晰度,我们的数据集包含三类样本:必须行动(即必须分类)、必须弃权以及不确定,并进一步细分为八个子类别。所有样本均有人类感知真值标签;即使是极度模糊的不确定样本标签也通过众包方式获得。数据集的有效性已通过抽样理论、与调查设计相关的统计理论及领域专家的验证。利用该数据集,我们分析了五种主流视觉感知模型和七种弃权方法在对齐性和可靠性方面的表现。我们的代码和数据已在https://github.com/jiyounglee-0523/VisAlign开源。