Optimizing Feature Set for Click-Through Rate Prediction

Click-through prediction (CTR) models transform features into latent vectors and enumerate possible feature interactions to improve performance based on the input feature set. Therefore, when selecting an optimal feature set, we should consider the influence of both feature and its interaction. However, most previous works focus on either feature field selection or only select feature interaction based on the fixed feature set to produce the feature set. The former restricts search space to the feature field, which is too coarse to determine subtle features. They also do not filter useless feature interactions, leading to higher computation costs and degraded model performance. The latter identifies useful feature interaction from all available features, resulting in many redundant features in the feature set. In this paper, we propose a novel method named OptFS to address these problems. To unify the selection of feature and its interaction, we decompose the selection of each feature interaction into the selection of two correlated features. Such a decomposition makes the model end-to-end trainable given various feature interaction operations. By adopting feature-level search space, we set a learnable gate to determine whether each feature should be within the feature set. Because of the large-scale search space, we develop a learning-by-continuation training scheme to learn such gates. Hence, OptFS generates the feature set only containing features which improve the final prediction results. Experimentally, we evaluate OptFS on three public datasets, demonstrating OptFS can optimize feature sets which enhance the model performance and further reduce both the storage and computational cost.

翻译：点击率预测（CTR）模型将特征转化为潜向量，并基于输入特征集枚举可能的特征交互以提升性能。因此，在选取最优特征集时，需同时考虑特征及其交互的影响。然而，现有研究多聚焦于特征域选择，或仅在固定特征集基础上筛选特征交互以生成特征集。前者将搜索空间限制在特征域层面，过于粗糙而难以确定细微特征，且无法过滤无效的特征交互，导致计算成本提高和模型性能下降；后者从所有可用特征中识别有用特征交互，导致特征集中包含大量冗余特征。本文提出名为OptFS的新型方法以解决上述问题。为统一特征及其交互的选择，我们将每个特征交互的选择分解为两个相关特征的选择。这种分解使得模型在多种特征交互操作下均可实现端到端训练。通过采用特征级搜索空间，我们设置可学习门控以判定各特征是否应纳入特征集。针对大规模搜索空间，我们开发了基于连续学习（learning-by-continuation）的训练方案来优化这些门控。因此，OptFS生成的最终特征集仅包含能提升预测结果的特性。实验表明，OptFS在三个公开数据集上能有效优化特征集，从而增强模型性能并进一步降低存储与计算成本。