Paired comparison data, where users evaluate items in pairs, play a central role in ranking and preference learning tasks. While ordinal comparison data intuitively offer richer information than binary comparisons, this paper challenges that conventional wisdom. We propose a general parametric framework for modeling ordinal paired comparisons without ties. The model adopts a generalized additive structure, featuring a link function that quantifies the preference difference between two items and a pattern function that governs the distribution over ordinal response levels. This framework encompasses classical binary comparison models as special cases, by treating binary responses as binarized versions of ordinal data. Within this framework, we show that binarizing ordinal data can significantly improve the accuracy of ranking recovery. Specifically, we prove that under the counting algorithm, the ranking error associated with binary comparisons exhibits a faster exponential convergence rate than that of ordinal data. Furthermore, we characterize a substantial performance gap between binary and ordinal data in terms of a signal-to-noise ratio (SNR) determined by the pattern function. We identify the pattern function that minimizes the SNR and maximizes the benefit of binarization. Extensive simulations and a real application on the MovieLens dataset further corroborate our theoretical findings.
翻译:成对比较数据,即用户对物品进行两两评估,在排序和偏好学习任务中发挥着核心作用。虽然序数比较数据直观上比二元比较提供了更丰富的信息,但本文挑战了这一传统观点。我们提出了一个用于建模无平局的序数成对比较的通用参数化框架。该模型采用广义加性结构,包含一个量化两个物品间偏好差异的连接函数,以及一个控制序数响应级别分布的模式函数。该框架将经典的二元比较模型作为特例包含在内,将二元响应视为序数数据的二值化版本。在此框架内,我们证明了对序数数据进行二值化可以显著提高排序恢复的准确性。具体而言,我们证明了在计数算法下,与二元比较相关的排序误差展现出比序数数据更快的指数收敛速度。此外,我们根据由模式函数确定的信噪比,刻画了二元数据与序数数据之间的显著性能差距。我们识别了能够最小化信噪比并最大化二值化效益的模式函数。在MovieLens数据集上进行的大量模拟和一项实际应用进一步证实了我们的理论发现。