We explore a novel problem in streaming submodular maximization, inspired by the dynamics of news-recommendation platforms. We consider a setting where users can visit a news website at any time, and upon each visit, the website must display up to $k$ news items. User interactions are inherently stochastic: each news item presented to the user is consumed with a certain acceptance probability by the user, and each news item covers certain topics. Our goal is to design a streaming algorithm that maximizes the expected total topic coverage. To address this problem, we establish a connection to submodular maximization subject to a matroid constraint. We show that we can effectively adapt previous methods to address our problem when the number of user visits is known in advance or linear-size memory in the stream length is available. However, in more realistic scenarios where only an upper bound on the visits and sublinear memory is available, the algorithms fail to guarantee any bounded performance. To overcome these limitations, we introduce a new online streaming algorithm that achieves a competitive ratio of $1/(8δ)$, where $δ$ controls the approximation quality. Moreover, it requires only a single pass over the stream, and uses memory independent of the stream length. Empirically, our algorithms consistently outperform the baselines.
翻译:我们探索了流式次模最大化中的一个新颖问题,该问题受到新闻推荐平台动态特性的启发。我们考虑这样一种场景:用户可以随时访问新闻网站,而网站每次必须展示最多$k$条新闻。用户交互本质上是随机的:每条呈现给用户的新闻以一定的接受概率被用户消费,且每条新闻涵盖特定主题。我们的目标是设计一种流式算法,以最大化期望总主题覆盖率。为解决该问题,我们建立了与受拟阵约束的次模最大化之间的联系。我们证明,当用户访问次数已知或具备与流长度线性相关的内存时,可以有效地调整现有方法来解决我们的问题。然而,在更现实的场景中——仅已知访问次数的上界且仅具备亚线性内存时,现有算法无法保证任何有界的性能。为克服这些限制,我们提出了一种新的在线流式算法,其竞争比为$1/(8δ)$,其中$δ$控制近似质量。此外,该算法仅需单次遍历数据流,且内存使用与流长度无关。实验结果表明,我们的算法始终优于基线方法。