Constructing decision trees online is a classical machine learning problem. Existing works often assume that features are readily available for each incoming data point. However, in many real world applications, both feature values and the labels are unknown a priori and can only be obtained at a cost. For example, in medical diagnosis, doctors have to choose which tests to perform (i.e., making costly feature queries) on a patient in order to make a diagnosis decision (i.e., predicting labels). We provide a fresh perspective to tackle this practical challenge. Our framework consists of an active planning oracle embedded in an online learning scheme for which we investigate several information acquisition functions. Specifically, we employ a surrogate information acquisition function based on adaptive submodularity to actively query feature values with a minimal cost, while using a posterior sampling scheme to maintain a low regret for online prediction. We demonstrate the efficiency and effectiveness of our framework via extensive experiments on various real-world datasets. Our framework also naturally adapts to the challenging setting of online learning with concept drift and is shown to be competitive with baseline models while being more flexible.
翻译:在线构建决策树是一个经典的机器学习问题。现有研究通常假设每个传入数据点的特征值已预先可用。然而,在许多实际应用场景中,特征值和标签均事先未知,且只能通过付出代价获取。例如,在医疗诊断中,医生需选择对患者进行哪些检查(即代价高昂的特征查询)以做出诊断决策(即预测标签)。我们为应对这一实际挑战提供了全新视角。本框架包含一个嵌入在线学习方案的主动规划预言机,并针对该方案研究了若干信息获取函数。具体而言,我们采用基于自适应次模性的替代信息获取函数,以最小代价主动查询特征值,同时利用后验采样方案维持在线预测的低遗憾。通过在多种真实世界数据集上的大量实验,我们证明了框架的高效性与有效性。本框架亦能自然适应包含概念漂移的挑战性在线学习场景,在保持灵活性的同时展现出与基线模型相当的竞争力。