In this paper, we tackle the problem of computing a sequence of rankings with the guarantee of the Pareto-optimal balance between (1) maximizing the utility of the consumers and (2) minimizing unfairness between producers of the items. Such a multi-objective optimization problem is typically solved using a combination of a scalarization method and linear programming on bi-stochastic matrices, representing the distribution of possible rankings of items. However, the above-mentioned approach relies on Birkhoff-von Neumann (BvN) decomposition, of which the computational complexity is $\mathcal{O}(n^5)$ with $n$ being the number of items, making it impractical for large-scale systems. To address this drawback, we introduce a novel approach to the above problem by using the Expohedron - a permutahedron whose points represent all achievable exposures of items. On the Expohedron, we profile the Pareto curve which captures the trade-off between group fairness and user utility by identifying a finite number of Pareto optimal solutions. We further propose an efficient method by relaxing our optimization problem on the Expohedron's circumscribed $n$-sphere, which significantly improve the running time. Moreover, the approximate Pareto curve is asymptotically close to the real Pareto optimal curve as the number of substantial solutions increases. Our methods are applicable with different ranking merits that are non-decreasing functions of item relevance. The effectiveness of our methods are validated through experiments on both synthetic and real-world datasets.
翻译:本文研究了在保证帕累托最优平衡的前提下计算排序序列的问题,该平衡需同时满足:(1) 最大化消费者效用;(2) 最小化项目生产者之间的不公平性。此类多目标优化问题通常通过标量化方法与双随机矩阵上的线性规划相结合来解决,其中双随机矩阵表示项目可能排序的分布。然而,上述方法依赖Birkhoff-von Neumann (BvN)分解,其计算复杂度为$\mathcal{O}(n^5)$($n$为项目数量),导致该方法无法应用于大规模系统。为克服这一缺陷,我们引入了一种基于Expohedron的新方法——Expohedron是一种排列多面体,其上的点代表所有可实现的项目曝光量。在Expohedron上,我们通过识别有限数量的帕累托最优解,绘制了表征群体公平性与用户效用之间的权衡的帕累托曲线。进一步,我们通过将优化问题松弛到Expohedron的外接$n$维球面上,提出了一种高效方法,显著提升了运行时间。此外,随着实质解数量的增加,近似帕累托曲线渐近逼近真实帕累托最优曲线。我们的方法适用于项目相关性非递减函数的不同排序准则。通过在合成数据集和真实数据集上的实验验证了该方法的有效性。