Efficient learning of user preferences is crucial for many modern decision making systems but typically requires costly labeled data. Active learning reduces this cost, yet standard methods are computationally expensive due to pool-based evaluation. Further, most methods assume all query feedback is equally reliable, ignoring that pairwise queries between nearly identical or entirely dissimilar items yield ambiguous, low-confidence responses. To address the issue of feedback reliability, we introduce a novel confidence aware response model that explicitly accounts for these ambiguous comparisons. To overcome the computational bottleneck of pool-based evaluation, we propose an active query synthesis framework, Info-Synth that generates optimal queries by maximizing a mutual information-based objective within a continuous space. Moreover, we propose two strategies, Pair M-dist and Pair Opt-dist, that extend Info-Synth to select effective queries even when restricted to finite query pools. We demonstrate our framework's versatility and performance across synthetic preference learning, constrained text summary datasets, and subjective, continuous-space controller gain tuning for a simulated mobile robot.
翻译:用户偏好的高效学习对许多现代决策系统至关重要,但通常需要昂贵的标注数据。主动学习能够降低这一成本,然而基于池评估的标准方法在计算上代价高昂。此外,多数方法假设所有查询反馈具有相同可靠性,忽略了近乎相同或完全不同的项目间的成对查询会产生模糊且低置信度的响应。为解决反馈可靠性问题,我们引入了一种新型置信度感知响应模型,显式处理这些模糊比较。为克服基于池评估的计算瓶颈,我们提出了一种主动查询合成框架Info-Synth,该框架通过最大化连续空间内基于互信息的目标函数来生成最优查询。此外,我们提出了两种策略Pair M-dist和Pair Opt-dist,将Info-Synth扩展到即使在有限查询池中也能选择有效查询。我们通过合成偏好学习、受限文本摘要数据集以及模拟移动机器人的主观连续空间控制器增益调优,展示了该框架的通用性与性能。