When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making

Historically, much of machine learning research has focused on the performance of the algorithm alone, but recently more attention has been focused on optimizing joint human-algorithm performance. Here, we analyze a specific type of human-algorithm collaboration where the algorithm has access to a set of $n$ items, and presents a subset of size $k$ to the human, who selects a final item from among those $k$. This scenario could model content recommendation, route planning, or any type of labeling task. Because both the human and algorithm have imperfect, noisy information about the true ordering of items, the key question is: which value of $k$ maximizes the probability that the best item will be ultimately selected? For $k=1$, performance is optimized by the algorithm acting alone, and for $k=n$ it is optimized by the human acting alone. Surprisingly, we show that for multiple of noise models, it is optimal to set $k \in [2, n-1]$ - that is, there are strict benefits to collaborating, even when the human and algorithm have equal accuracy separately. We demonstrate this theoretically for the Mallows model and experimentally for the Random Utilities models of noisy permutations. However, we show this pattern is reversed when the human is anchored on the algorithm's presented ordering - the joint system always has strictly worse performance. We extend these results to the case where the human and algorithm differ in their accuracy levels, showing that there always exist regimes where a more accurate agent would strictly benefit from collaborating with a less accurate one, but these regimes are asymmetric between the human and the algorithm's accuracy.

翻译：历史而言，机器学习研究多聚焦于算法本身的性能，但近期更多关注优化人机联合性能。本文分析一种特定的人机协作模式：算法可访问包含n个项的集合，并向人类展示其中k个项的子集，由人类从这k个项中选出最终项。这一情景可模拟内容推荐、路线规划或任何类型的标注任务。由于人类与算法对项的真实排序均掌握不完美且含噪信息，关键问题在于：k取何值能最大化最优项被最终选中的概率？当k=1时，系统性能由算法独立优化；当k=n时，则由人类独立优化。令人惊讶的是，我们证明在多种噪声模型下，设置k∈[2, n-1]为最优——即当人类与算法各自具有同等准确率时，协作仍能带来严格收益。我们从理论上针对Mallows模型、实验上针对含噪排列的随机效用模型验证了这一结论。然而，当人类受算法呈现排序的锚定效应影响时，该模式发生逆转——联合系统性能始终更差。我们将上述结果拓展至人类与算法准确率存在差异的情形：表明总存在某些区间，其中高准确率主体通过协作能从低准确率主体获得严格收益，但这些区间在人类与算法准确率上呈现非对称性。