The $k$-Maximum Inner Product Search ($k$MIPS) serves as a foundational component in recommender systems and various data mining tasks. However, while most existing $k$MIPS approaches prioritize the efficient retrieval of highly relevant items for users, they often neglect an equally pivotal facet of search results: \emph{diversity}. To bridge this gap, we revisit and refine the diversity-aware $k$MIPS (D$k$MIPS) problem by incorporating two well-known diversity objectives -- minimizing the average and maximum pairwise item similarities within the results -- into the original relevance objective. This enhancement, inspired by Maximal Marginal Relevance (MMR), offers users a controllable trade-off between relevance and diversity. We introduce \textsc{Greedy} and \textsc{DualGreedy}, two linear scan-based algorithms tailored for D$k$MIPS. They both achieve data-dependent approximations and, when aiming to minimize the average pairwise similarity, \textsc{DualGreedy} attains an approximation ratio of $1/4$ with an additive term for regularization. To further improve query efficiency, we integrate a lightweight Ball-Cone Tree (BC-Tree) index with the two algorithms. Finally, comprehensive experiments on ten real-world data sets demonstrate the efficacy of our proposed methods, showcasing their capability to efficiently deliver diverse and relevant search results to users.
翻译:$k$最大内积搜索($k$MIPS)是推荐系统及各类数据挖掘任务的基础组件。然而,现有$k$MIPS方法大多优先考虑高效检索与用户高度相关的项目,却往往忽略了搜索结果中同等关键的一个方面——\emph{多样性}。为弥补这一不足,我们重新审视并改进了面向多样性的$k$MIPS(D$k$MIPS)问题,通过将两种广为人知的多样性目标——最小化结果中项目对的平均相似度和最大相似度——融入原始相关性目标中。这一改进受最大边际相关性(MMR)启发,为用户提供了相关性与多样性之间的可控权衡。我们提出了\textsc{Greedy}和\textsc{DualGreedy}两种基于线性扫描的算法,专为D$k$MIPS设计。两者均能实现数据依赖的近似保证,且在最小化平均成对相似度时,\textsc{DualGreedy}可达到$1/4$的近似比并附加正则化项。为进一步提升查询效率,我们将轻量级球锥树(BC-Tree)索引与这两种算法相结合。最后,在十个真实数据集上的全面实验证明了所提出方法的有效性,展示了其高效向用户提供多样且相关搜索结果的能力。