We consider the sparse contextual bandit problem where arm feature affects reward through the inner product of sparse parameters. Recent studies have developed sparsity-agnostic algorithms based on the greedy arm selection policy. However, the analysis of these algorithms requires strong assumptions on the arm feature distribution to ensure that the greedily selected samples are sufficiently diverse; One of the most common assumptions, relaxed symmetry, imposes approximate origin-symmetry on the distribution, which cannot allow distributions that has origin-asymmetric support. In this paper, we show that the greedy algorithm is applicable to a wider range of the arm feature distributions from two aspects. Firstly, we show that a mixture distribution that has a greedy-applicable component is also greedy-applicable. Second, we propose new distribution classes, related to Gaussian mixture, discrete, and radial distribution, for which the sample diversity is guaranteed. The proposed classes can describe distributions with origin-asymmetric support and, in conjunction with the first claim, provide theoretical guarantees of the greedy policy for a very wide range of the arm feature distributions.
翻译:我们考虑稀疏上下文赌博机问题,其中臂特征通过稀疏参数的内积影响奖励。近期研究基于贪婪臂选择策略开发了对稀疏度不敏感的算法。然而,这些算法的分析需要对臂特征分布施加强假设,以确保贪婪选择的样本具有充分多样性;最常见的假设之一——松弛对称性——要求分布近似关于原点对称,这无法涵盖支撑集非原点对称的分布。本文从两个角度证明贪婪算法适用于更广泛的臂特征分布类。首先,我们证明包含贪婪适用成分的混合分布同样具有贪婪适用性。其次,我们提出与高斯混合分布、离散分布和径向分布相关的新分布类,这些分布类能保证样本多样性。所提出的分布类能够描述支撑集非原点对称的分布,并结合第一个结论,为极广泛臂特征分布下的贪婪策略提供理论保证。