Determining a winner among a set of items using active pairwise comparisons under a limited budget is a challenging problem in preference-based learning. The goal of this study is to implement and evaluate the PARWiS algorithm, which shows spectral ranking and disruptive pair selection to identify the best item under shoestring budgets. This work have extended the PARWiS with a contextual variant (Contextual PARWiS) and a reinforcement learning-based variant (RL PARWiS), comparing them against baselines, including Double Thompson Sampling and a random selection strategy. This evaluation spans synthetic and real-world datasets (Jester and MovieLens), using budgets of 40, 60, and 80 comparisons for 20 items. The performance is measured through recovery fraction, true rank of reported winner, reported rank of true winner, and cumulative regret, alongside the separation metric \(Δ_{1,2}\). Results show that PARWiS and RL PARWiS outperform baselines across all datasets, particularly in the Jester dataset with a higher \(Δ_{1,2}\), while performance gaps narrow in the more challenging MovieLens dataset with a smaller \(Δ_{1,2}\). Contextual PARWiS shows comparable performance to PARWiS, indicating that contextual features may require further tuning to provide significant benefits.
翻译:在有限预算下使用主动成对比较从一组候选项中确定优胜者,是基于偏好的学习中的一个具有挑战性的问题。本研究的目标是实现并评估PARWiS算法,该算法结合谱排序和破坏性配对选择,以在极低预算下识别最佳候选项。本研究扩展了PARWiS,提出了一个上下文感知变体(Contextual PARWiS)和一个基于强化学习的变体(RL PARWiS),并将它们与基线方法(包括Double Thompson Sampling和随机选择策略)进行比较。评估涵盖合成数据集和真实世界数据集(Jester和MovieLens),针对20个候选项分别使用40、60和80次比较的预算。性能通过恢复分数、报告优胜者的真实排名、真实优胜者的报告排名、累积遗憾以及分离度量 \(Δ_{1,2}\) 来衡量。结果表明,PARWiS和RL PARWiS在所有数据集上均优于基线方法,尤其是在具有较高 \(Δ_{1,2}\) 的Jester数据集中;而在具有较小 \(Δ_{1,2}\)、更具挑战性的MovieLens数据集中,性能差距缩小。Contextual PARWiS表现出与PARWiS相当的性能,这表明上下文特征可能需要进一步调优才能提供显著优势。