In various real-world scenarios, such as recommender systems and political surveys, pairwise rankings are commonly collected and utilized for rank aggregation to obtain an overall ranking of items. However, preference rankings can reveal individuals' personal preferences, underscoring the need to protect them from being released for downstream analysis. In this paper, we address the challenge of preserving privacy while ensuring the utility of rank aggregation based on pairwise rankings generated from a general comparison model. Using the randomized response mechanism to perturb raw pairwise rankings is a common privacy protection strategy used in practice. However, a critical challenge arises because the privatized rankings no longer adhere to the original model, resulting in significant bias in downstream rank aggregation tasks. Motivated by this, we propose to adaptively debiasing the rankings from the randomized response mechanism, ensuring consistent estimation of true preferences and enhancing the utility of downstream rank aggregation. Theoretically, we offer insights into the relationship between overall privacy guarantees and estimation errors from private ranking data, and establish minimax rates for estimation errors. This enables the determination of optimal privacy guarantees that balance consistency in rank aggregation with privacy protection. We also investigate convergence rates of expected ranking errors for partial and full ranking recovery, quantifying how privacy protection influences the specification of top-$K$ item sets and complete rankings. Our findings are validated through extensive simulations and a real application.
翻译:在推荐系统和政治调查等多种现实场景中,成对排序常被收集并用于秩聚合,以获得项目的整体排序。然而,偏好排序可能揭示个体的个人偏好,这凸显了在将其发布用于下游分析时进行保护的必要性。本文解决了在基于一般比较模型生成的成对排序进行秩聚合时,如何在保护隐私的同时确保其效用的问题。使用随机响应机制扰动原始成对排序是实践中常用的隐私保护策略。然而,一个关键挑战在于,私有化后的排序不再遵循原始模型,导致下游秩聚合任务出现显著偏差。受此启发,我们提出对随机响应机制产生的排序进行自适应去偏,确保对真实偏好的一致估计并提升下游秩聚合的效用。理论上,我们深入分析了整体隐私保证与私有排序数据估计误差之间的关系,并建立了估计误差的极小极大速率。这使得我们能够确定在秩聚合一致性与隐私保护之间取得平衡的最优隐私保证。我们还研究了部分和完全排序恢复的期望排序误差的收敛速率,量化了隐私保护如何影响前$K$项集合和完整排序的确定。我们的发现通过大量模拟和实际应用得到了验证。