We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the expected regret on this family, that is parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.
翻译:我们研究了随机组合半赌博机问题,其中结果的联合分布影响问题实例的复杂度(不同于标准赌博机)。通常考虑的分布依赖于特定参数值,这些参数值的先验知识在理论上需要,但在实践中难以估计;一个例子是常假设的次高斯族。我们通过考虑一个新的次指数分布一般族来缓解此问题,该族包含有界分布和高斯分布。我们证明了该族上期望遗憾的新下界,该下界由未知结果协方差矩阵参数化,该矩阵比次高斯矩阵更紧凑。接着,我们构建了一个使用协方差估计的算法,并提供了遗憾的紧渐近分析。最后,我们将结果应用于稀疏结果族,该族在许多推荐系统中具有应用。