Interactive preference learning systems present humans with queries as pairs of options; humans then select their preferred choice, allowing the system to infer preferences from these binary choices. While binary choice feedback is simple and widely used, it offers limited information about preference strength. To address this, we leverage human response times, which inversely correlate with preference strength, as complementary information. We introduce a computationally efficient method based on the EZ-diffusion model, combining choices and response times to estimate the underlying human utility function. Theoretical and empirical comparisons with traditional choice-only estimators show that for queries where humans have strong preferences (i.e., "easy" queries), response times provide valuable complementary information and enhance utility estimates. We integrate this estimator into preference-based linear bandits for fixed-budget best-arm identification. Simulations on three real-world datasets demonstrate that incorporating response times significantly accelerates preference learning.
翻译:交互式偏好学习系统向人类呈现成对选项的查询;人类随后选择其偏好的选项,使得系统能够从这些二元选择中推断偏好。虽然二元选择反馈简单且被广泛使用,但其提供的偏好强度信息有限。为解决此问题,我们利用与偏好强度呈负相关的人类响应时间作为补充信息。我们引入了一种基于EZ扩散模型的计算高效方法,结合选择与响应时间来估计潜在的人类效用函数。与传统的仅基于选择的估计器进行的理论和实证比较表明,对于人类具有强烈偏好的查询(即“简单”查询),响应时间提供了有价值的补充信息并增强了效用估计。我们将此估计器集成到基于偏好的线性赌博机中,用于固定预算下的最优臂识别。在三个真实世界数据集上的模拟实验表明,结合响应时间显著加速了偏好学习。