We study the problem of robust information selection for a Bayesian hypothesis testing / classification task, where the goal is to identify the true state of the world from a finite set of hypotheses based on observations from the selected information sources. We introduce a novel misclassification penalty framework, which enables non-uniform treatment of different misclassification events. Extending the classical subset selection framework, we study the problem of selecting a subset of sources that minimize the maximum penalty of misclassification under a limited budget, despite deletions or failures of a subset of the selected sources. We characterize the curvature properties of the objective function and propose an efficient greedy algorithm with performance guarantees. Next, we highlight certain limitations of optimizing for the maximum penalty metric and propose a submodular surrogate metric to guide the selection of the information set. We propose a greedy algorithm with near-optimality guarantees for optimizing the surrogate metric. Finally, we empirically demonstrate the performance of our proposed algorithms in several instances of the information set selection problem.
翻译:我们研究贝叶斯假设检验/分类任务中的鲁棒信息选择问题,其目标是在有限预算下从选定的信息源观测数据中识别世界的真实状态(假设空间有限)。我们提出了一种新颖的误分类惩罚框架,该框架支持对不同误分类事件进行非均匀处理。在经典子集选择框架的基础上,我们研究了以下问题:在选定子集中的部分信息源可能发生删除或故障的情况下,如何选择信息源子集以最小化误分类的最大惩罚。我们刻画了目标函数的曲率性质,并提出了一种具有性能保证的高效贪心算法。随后,我们指出了优化最大惩罚度量存在的某些局限性,并提出了一种可替代的子模代理度量来指导信息集的选择。针对该代理度量的优化,我们设计了一种具有近似最优性保证的贪心算法。最后,我们在多个信息集选择问题实例中通过实验验证了所提算法的性能。