Sequential Diversification with Provable Guarantees

Diversification is a useful tool for exploring large collections of information items. It has been used to reduce redundancy and cover multiple perspectives in information-search settings. Diversification finds applications in many different domains, including presenting search results of information-retrieval systems and selecting suggestions for recommender systems. Interestingly, existing measures of diversity are defined over \emph{sets} of items, rather than evaluating \emph{sequences} of items. This design choice comes in contrast with commonly-used relevance measures, which are distinctly defined over sequences of items, taking into account the ranking of items. The importance of employing sequential measures is that information items are almost always presented in a sequential manner, and during their information-exploration activity users tend to prioritize items with higher~ranking. In this paper, we study the problem of \emph{maximizing sequential diversity}. This is a new measure of \emph{diversity}, which accounts for the \emph{ranking} of the items, and incorporates \emph{item relevance} and \emph{user behavior}. The overarching framework can be instantiated with different diversity measures, and here we consider the measures of \emph{sum~diversity} and \emph{coverage~diversity}. The problem was recently proposed by Coppolillo et al.~\citep{coppolillo2024relevance}, where they introduce empirical methods that work well in practice. Our paper is a theoretical treatment of the problem: we establish the problem hardness and present algorithms with constant approximation guarantees for both diversity measures we consider. Experimentally, we demonstrate that our methods are competitive against strong baselines.

翻译：多样化是探索大规模信息集合的有用工具。它已被用于减少冗余并在信息检索场景中覆盖多重视角。多样化在众多不同领域均有应用，包括呈现信息检索系统的搜索结果以及为推荐系统选择建议。有趣的是，现有的多样性度量是针对项目的\emph{集合}而非项目的\emph{序列}进行定义的。这一设计选择与常用的相关性度量形成对比，后者明确针对项目序列进行定义，并考虑了项目的排序。采用序列度量的重要性在于，信息项目几乎总是以序列方式呈现，并且在用户的信息探索活动中，他们倾向于优先关注排名更高的项目。在本文中，我们研究\emph{最大化序列多样性}的问题。这是一种新的\emph{多样性}度量，它考虑了项目的\emph{排序}，并融合了\emph{项目相关性}和\emph{用户行为}。该总体框架可以通过不同的多样性度量进行实例化，本文我们考虑\emph{和多样性}与\emph{覆盖多样性}这两种度量。该问题最近由Coppolillo等人~\citep{coppolillo2024relevance}提出，他们引入了在实践中表现良好的经验方法。本文是对该问题的理论处理：我们确立了问题的计算难度，并针对所考虑的两种多样性度量提出了具有常数近似保证的算法。实验表明，我们的方法与强大的基线方法相比具有竞争力。