Submodular optimization with bandit feedback has recently been studied in a variety of contexts. In a number of real-world applications such as diversified recommender systems and data summarization, the submodular function exhibits additional linear structure. We consider developing approximation algorithms for the maximization of a submodular objective function $f:2^U\to\mathbb{R}_{\geq 0}$, where $f=\sum_{i=1}^dw_iF_{i}$. It is assumed that we have value oracle access to the functions $F_i$, but the coefficients $w_i$ are unknown, and $f$ can only be accessed via noisy queries. We develop algorithms for this setting inspired by adaptive allocation algorithms in the best-arm identification for linear bandit, with approximation guarantees arbitrarily close to the setting where we have value oracle access to $f$. Finally, we empirically demonstrate that our algorithms make vast improvements in terms of sample efficiency compared to algorithms that do not exploit the linear structure of $f$ on instances of move recommendation.
翻译:近年来,赌博机反馈下的次模优化已在多种情境中得到研究。在多样化推荐系统和数据摘要等诸多实际应用中,次模函数展现出额外的线性结构。本文考虑为次模目标函数 $f:2^U\to\mathbb{R}_{\geq 0}$ 的最大化问题开发近似算法,其中 $f=\sum_{i=1}^dw_iF_{i}$。假设我们可以通过值预言机访问函数 $F_i$,但系数 $w_i$ 未知,且 $f$ 只能通过带噪声的查询进行访问。受线性赌博机最佳臂识别中自适应分配算法的启发,我们为此场景开发了算法,其近似保证可任意接近我们拥有 $f$ 值预言机访问权限的情形。最后,我们在电影推荐实例上通过实验证明,相较于未利用 $f$ 线性结构的算法,我们的算法在样本效率方面取得了显著提升。