Eliciting reliable human feedback is essential for many machine learning tasks, such as learning from noisy labels and aligning AI systems with human preferences. Peer prediction mechanisms incentivize truthful reporting without ground truth verification by scoring agents based on correlations with peers. Traditional mechanisms, which ensure that truth-telling maximizes the expected scores in equilibrium, can elicit honest information while assuming agents' utilities are linear functions of their scores. However, in practice, non-linear payment rules are usually preferred, or agents' utilities are inherently non-linear. We propose stochastically dominant truthfulness (SD-truthfulness) as a stronger guarantee: the score distribution of truth-telling stochastically dominates all other strategies, incentivizing truthful reporting for a wide range of monotone utility functions. Our first observation is that no existing peer prediction mechanism naturally satisfies this criterion without strong assumptions. A simple solution -- rounding scores into binary lotteries -- can enforce SD-truthfulness, but often degrades sensitivity, a key property related to fairness and statistical efficiency. We demonstrate how a more careful application of rounding can better preserve sensitivity. Furthermore, we introduce a new enforced agreement (EA) mechanism that is theoretically guaranteed to be SD-truthful in binary-signal settings under mild assumptions, and empirically achieves the highest sensitivity among all known SD-truthful mechanisms.
翻译:获取可靠的人类反馈对于许多机器学习任务至关重要,例如从噪声标签中学习以及使AI系统与人类偏好对齐。同伴预测机制无需真实标签验证即可激励诚实报告,其通过基于同伴相关性对评分者进行评分来实现激励相容性。传统机制通过保证诚实报告在均衡状态下使期望得分最大化,能够在假设评分者效用为得分的线性函数时获取诚实信息。然而在实践中,非线性支付规则通常更受青睐,或评分者的效用天然具有非线性特性。我们提出随机占优诚实性(SD-诚实性)作为更强的保证:诚实报告的得分分布在随机占优意义上优于所有其他策略,从而能激励广泛单调效用函数下的诚实报告。首要发现是:现有同伴预测机制在无强假设条件下均无法天然满足该准则。简单方案——将评分舍入为二元抽彩——虽然可实现SD-诚实性,但通常会降低敏感度这一与公平性和统计效率密切相关的关键属性。我们展示了如何通过更精细的舍入策略更好地保持敏感度。此外,我们提出新型强制一致性机制,该机制在二元信号场景下基于温和假设具有理论上的SD-诚实性保证,并在所有已知SD-诚实机制中实现了经验上的最高敏感度。