Features (a.k.a. context) are critical for contextual multi-armed bandits (MAB) performance. In practice of large scale online system, it is important to select and implement important features for the model: missing important features can led to sub-optimal reward outcome, and including irrelevant features can cause overfitting, poor model interpretability, and implementation cost. However, feature selection methods for conventional machine learning models fail short for contextual MAB use cases, as conventional methods select features correlated with the outcome variable, but not necessarily causing heterogeneuous treatment effect among arms which are truely important for contextual MAB. In this paper, we introduce model-free feature selection methods designed for contexutal MAB problem, based on heterogeneous causal effect contributed by the feature to the reward distribution. Empirical evaluation is conducted based on synthetic data as well as real data from an online experiment for optimizing content cover image in a recommender system. The results show this feature selection method effectively selects the important features that lead to higher contextual MAB reward than unimportant features. Compared with model embedded method, this model-free method has advantage of fast computation speed, ease of implementation, and prune of model mis-specification issues.
翻译:特征(又称上下文)对于上下文多臂赌博机(MAB)的性能至关重要。在大规模在线系统的实践中,为模型筛选并实施重要特征具有重要意义:遗漏重要特征可能导致次优的奖励结果,而纳入无关特征则可能引发过拟合、模型可解释性下降及实施成本增加等问题。然而,传统机器学习模型的特征选择方法在上下文MAB应用场景中存在局限,因为传统方法选择的是与结果变量相关的特征,但这些特征未必能导致臂间异质性处理效应——而这正是上下文MAB真正需要的关键特性。本文基于特征对奖励分布产生的异质因果效应,提出了专为上下文MAB问题设计的无模型特征选择方法。我们通过合成数据以及来自推荐系统内容封面图优化在线实验的真实数据进行了实证评估。结果表明,该特征选择方法能有效筛选出重要特征,相较于非重要特征,这些特征能为上下文MAB带来更高的奖励收益。与嵌入式模型方法相比,此无模型方法具有计算速度快、易于实施,并能规避模型设定错误问题的优势。