Evaluating Perceptual Distance Models by Fitting Binomial Distributions to Two-Alternative Forced Choice Data

The two-alternative forced choice (2AFC) experimental method is popular in the visual perception literature, where practitioners aim to understand how human observers perceive distances within triplets made of a reference image and two distorted versions. In the past, this had been conducted in controlled environments, with triplets sharing images, so it was possible to rank the perceived quality. This ranking would then be used to evaluate perceptual distance models against the experimental data. Recently, crowd-sourced perceptual datasets have emerged, with no images shared between triplets, making ranking infeasible. Evaluating perceptual distance models using this data reduces the judgements on a triplet to a binary decision, namely, whether the distance model agrees with the human decision - which is suboptimal and prone to misleading conclusions. Instead, we statistically model the underlying decision-making process during 2AFC experiments using a binomial distribution. Having enough empirical data, we estimate a smooth and consistent distribution of the judgements on the reference-distorted distance plane, according to each distance model. By applying maximum likelihood, we estimate the parameter of the local binomial distribution, and a global measurement of the expected log-likelihood of the measured responses. We calculate meaningful and well-founded metrics for the distance model, beyond the mere prediction accuracy as percentage agreement, even with variable numbers of judgements per triplet -- key advantages over both classical and neural network methods.

翻译：双选项强制选择（2AFC）实验方法在视觉感知研究中被广泛采用，研究者旨在通过包含参考图像及其两种失真版本的图像三元组来理解人类观察者的感知距离判断。传统上，此类实验在受控环境中进行，且三元组间共享图像，因而能够对感知质量进行排序。这种排序结果随后被用于根据实验数据评估感知距离模型。近年来，出现了众包感知数据集，其三元组间不存在共享图像，使得排序方法不再适用。使用此类数据评估感知距离模型时，需将三元组的判断简化为二元决策——即判断距离模型是否与人类决策一致——这种方法存在缺陷，容易导致误导性结论。为此，我们采用二项分布对2AFC实验中的底层决策过程进行统计建模。基于充足的实证数据，我们根据每个距离模型，在参考-失真距离平面上估计出平滑且一致的判断分布。通过极大似然估计，我们计算出局部二项分布的参数，以及对测量响应的期望对数似然全局度量。即使在三元组判断数量可变的情况下，我们仍能计算出具有明确意义和理论依据的距离模型度量指标，其价值远超简单的预测准确率（即一致百分比）——这相较于传统方法和神经网络方法具有关键优势。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/