A Survey on Active Feature Acquisition Strategies

Active feature acquisition (AFA) studies how to sequentially acquire features for each data instance to trade off predictive performance against acquisition cost. This survey offers the first unified treatment of AFA via an explicit partially observable Markov decision process (POMDP) formulation. We place this formulation in the broader literature on optimal information acquisition and, more specifically, in a family of structured POMDPs (for example, information-gathering and sensing POMDPs) whose assumptions and algorithmic tools directly apply to AFA. This connection provides a common language for comparing problem settings and methods, and it highlights where AFA can leverage established results in structured POMDP planning and approximation. Building on this perspective, we present an up-to-date taxonomy of AFA methods that (roughly) mirrors standard approaches to solving POMDPs: (i) embedded cost-aware predictors (notably cost-sensitive decision trees and ensembles), (ii) model-based methods that plan using learned probabilistic components, (iii) model-free methods that learn acquisition policies from simulated episodes, and (iv) hybrid methods that combine the strengths of model-based and model-free approaches. We argue that this POMDP-centric view clarifies connections among existing methods and motivates more principled algorithm design. Since much prior work is heuristic and lacks formal guarantees, we also outline routes to guarantees by connecting AFA to adaptive stochastic optimization. We conclude by highlighting open challenges and promising directions for future research.

翻译：主动特征获取研究如何为每个数据实例顺序获取特征，以权衡预测性能与获取成本。本文通过显式的部分可观测马尔可夫决策过程（POMDP）建模，首次为主动特征获取领域提供了统一的理论框架。我们将该框架置于最优信息获取的广义文献背景中，并具体归入结构化POMDP（如信息收集与感知POMDP）体系，其假设条件与算法工具可直接适用于主动特征获取。这种关联为比较问题设置与方法提供了通用术语体系，同时揭示了主动特征获取如何利用结构化POMDP规划与近似领域的成熟成果。基于此视角，我们提出了与POMDP标准求解路径相对应的主动特征获取方法分类体系：（一）嵌入式成本感知预测器（尤指成本敏感决策树及集成模型），（二）基于学习概率组件进行规划的模型驱动方法，（三）通过模拟轨迹学习获取策略的无模型方法，（四）融合模型驱动与无模型优势的混合方法。我们认为这种以POMDP为核心的研究视角能厘清现有方法间的内在联系，并推动更具原则性的算法设计。鉴于现有研究多采用启发式方法且缺乏理论保证，我们还通过将主动特征获取与自适应随机优化建立联系，勾勒出理论保证的实现路径。最后，我们指出了该领域面临的开放挑战与未来研究的潜在方向。