Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning, but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality, and it outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
翻译:特征选择有助于降低机器学习中的数据采集成本,但标准方法是使用静态特征子集训练模型。本文考虑动态特征选择(DFS)问题,即模型根据当前可用信息顺序查询特征。DFS通常采用强化学习方法解决,但我们探索了一种更简单的基于条件互信息贪婪选择特征的方法。该方法在理论上具有吸引力,但需要数据分布的预言机访问权限,因此我们开发了一种基于摊销优化的学习方法。实验表明,当该方法被训练至最优时,能够恢复贪婪策略,并且其性能优于众多现有特征选择方法,从而验证了其作为解决该问题简单而有效方法的可行性。