Mixture models of Plackett-Luce (PL) -- one of the most fundamental ranking models -- are an active research area of both theoretical and practical significance. Most previously proposed parameter estimation algorithms instantiate the EM algorithm, often with random initialization. However, such an initialization scheme may not yield a good initial estimate and the algorithms require multiple restarts, incurring a large time complexity. As for the EM procedure, while the E-step can be performed efficiently, maximizing the log-likelihood in the M-step is difficult due to the combinatorial nature of the PL likelihood function (Gormley and Murphy 2008). Therefore, previous authors favor algorithms that maximize surrogate likelihood functions (Zhao et al. 2018, 2020). However, the final estimate may deviate from the true maximum likelihood estimate as a consequence. In this paper, we address these known limitations. We propose an initialization algorithm that can provide a provably accurate initial estimate and an EM algorithm that maximizes the true log-likelihood function efficiently. Experiments on both synthetic and real datasets show that our algorithm is competitive in terms of accuracy and speed to baseline algorithms, especially on datasets with a large number of items.
翻译:Plackett-Luce(PL)模型(一种最基本的排序模型)的混合模型,是兼具理论与实践意义的热门研究领域。以往提出的多数参数估计算法都基于期望最大化(EM)算法,且常采用随机初始化。然而,这种初始化方案可能无法提供良好的初始估计,且算法需要多次重启,导致时间复杂度过高。就EM过程而言,虽然E步可高效执行,但由于PL似然函数具有组合特性(Gormley and Murphy 2008),在M步中最大化对数似然函数十分困难。因此,先前研究者倾向于采用最大化代理似然函数的算法(Zhao et al. 2018, 2020)。但这会导致最终估计可能偏离真实的最大似然估计。本文针对这些已知局限,提出了一种能提供可证明精确初始估计的初始化算法,以及一种能有效最大化真实对数似然函数的EM算法。在合成数据集与真实数据集上的实验表明,本算法在精度和速度上与基线算法具有竞争力,尤其适用于包含大量物品的数据集。