Large-scale online marketplaces and recommender systems serve as critical technological support for e-commerce development. In industrial recommender systems, features play vital roles as they carry information for downstream models. Accurate feature importance estimation is critical because it helps identify the most useful feature subsets from thousands of feature candidates for online services. Such selection enables improved online performance while reducing computational cost. To address feature selection problems in deep learning, trainable gate-based and sensitivity-based methods have been proposed and proven effective in industrial practice. However, through the analysis of real-world cases, we identified three bias issues that cause feature importance estimation to rely on partial model layers, samples, or gradients, ultimately leading to inaccurate importance estimation. We refer to these as layer bias, baseline bias, and approximation bias. To mitigate these issues, we propose FairFS, a fair and accurate feature selection algorithm. FairFS regularizes feature importance estimated across all nonlinear transformation layers to address layer bias. It also introduces a smooth baseline feature close to the classifier decision boundary and adopts an aggregated approximation method to alleviate baseline and approximation biases. Extensive experiments demonstrate that FairFS effectively mitigates these biases and achieves state-of-the-art feature selection performance.
翻译:大规模在线市场和推荐系统是电子商务发展的重要技术支撑。在工业级推荐系统中,特征作为下游模型的信息载体发挥着至关重要的作用。准确的特征重要性估计尤为关键,因为它能从数千个候选特征中识别出对在线服务最有用的特征子集。这种选择在提升在线性能的同时,还能降低计算成本。针对深度学习中的特征选择问题,基于可训练门控和基于敏感性的方法已被提出,并在工业实践中被证明有效。然而,通过对实际案例的分析,我们发现了三类偏差问题,导致特征重要性估计仅依赖于部分模型层、样本或梯度,最终造成重要性估计不准确。我们将这些问题分别称为层偏差、基线偏差和近似偏差。为缓解这些问题,我们提出了FairFS——一种公平且准确的特征选择算法。FairFS通过正则化所有非线性变换层估计的特征重要性来解决层偏差;同时引入接近分类器决策边界的平滑基线特征,并采用聚合近似方法以减轻基线和近似偏差。大量实验表明,FairFS能有效缓解上述偏差,并实现了最先进的特征选择性能。