在 $O(n \log n)$ 时间内学习多项 Logit 模型 (Learning Multinomial Logits in $O(n \log n)$ time)

A Multinomial Logit (MNL) model is composed of a finite universe of items $[n]=\{1,..., n\}$, each assigned a positive weight. A query specifies an admissible subset -- called a slate -- and the model chooses one item from that slate with probability proportional to its weight. This query model is also known as the Plackett-Luce model or conditional sampling oracle in the literature. Although MNLs have been studied extensively, a basic computational question remains open: given query access to slates, how efficiently can we learn weights so that, for every slate, the induced choice distribution is within total variation distance $\varepsilon$ of the ground truth? This question is central to MNL learning and has direct implications for modern recommender system interfaces. We provide two algorithms for this task, one with adaptive queries and one with non-adaptive queries. Each algorithm outputs an MNL $M'$ that induces, for each slate $S$, a distribution $M'_S$ on $S$ that is within $\varepsilon$ total variation distance of the true distribution. Our adaptive algorithm makes $O\left(\frac{n}{\varepsilon^{3}}\log n\right)$ queries, while our non-adaptive algorithm makes $O\left(\frac{n^{2}}{\varepsilon^{3}}\log n \log\frac{n}{\varepsilon}\right)$ queries. Both algorithms query only slates of size two and run in time proportional to their query complexity. We complement these upper bounds with lower bounds of $Ω\left(\frac{n}{\varepsilon^{2}}\log n\right)$ for adaptive queries and $Ω\left(\frac{n^{2}}{\varepsilon^{2}}\log n\right)$ for non-adaptive queries, thus proving that our adaptive algorithm is optimal in its dependence on the support size $n$, while the non-adaptive one is tight within a $\log n$ factor.

翻译：多项 Logit（MNL）模型由一个有限项目全集 $[n]=\{1,..., n\}$ 构成，其中每个项目被赋予一个正权重。一次查询指定一个允许的子集——称为候选列表——模型以概率正比于其权重的方式从该列表中选取一个项目。该查询模型在文献中也被称为 Plackett-Luce 模型或条件采样预言机。尽管 MNL 已被广泛研究，但一个基本的计算问题仍未解决：给定对候选列表的查询访问权限，我们如何高效地学习权重，使得对于每个候选列表，所导出的选择分布与真实分布之间的总变差距离在 $\varepsilon$ 以内？这个问题是 MNL 学习的核心，并对现代推荐系统界面具有直接影响。我们为此任务提供了两种算法，一种使用自适应查询，另一种使用非自适应查询。每种算法输出一个 MNL 模型 $M'$，该模型为每个候选列表 $S$ 导出一个在 $S$ 上的分布 $M'_S$，该分布与真实分布的总变差距离在 $\varepsilon$ 以内。我们的自适应算法进行 $O\left(\frac{n}{\varepsilon^{3}}\log n\right)$ 次查询，而非自适应算法进行 $O\left(\frac{n^{2}}{\varepsilon^{3}}\log n \log\frac{n}{\varepsilon}\right)$ 次查询。两种算法仅查询大小为二的候选列表，且运行时间与其查询复杂度成正比。我们通过下界 $\Omega\left(\frac{n}{\varepsilon^{2}}\log n\right)$（自适应查询）和 $\Omega\left(\frac{n^{2}}{\varepsilon^{2}}\log n\right)$（非自适应查询）来补充这些上界，从而证明我们的自适应算法在支撑大小 $n$ 的依赖关系上是最优的，而非自适应算法在 $\log n$ 因子内是紧的。