We consider a class of optimization problems over stochastic variables where the algorithm can learn information about the value of any variable through a series of costly steps; we model this information acquisition process as a Markov Decision Process (MDP). The algorithm's goal is to minimize the cost of its solution plus the cost of information acquisition, or alternately, maximize the value of its solution minus the cost of information acquisition. Such bandit superprocesses have been studied previously but solutions are known only for fairly restrictive special cases. We develop a framework for approximate optimization of bandit superprocesses that applies to arbitrary processes with a matroid (and in some cases, more general) feasibility constraint. Our framework establishes a bound on the optimal cost through a novel cost amortization; it then couples this bound with a notion of local approximation that allows approximate solutions for each component MDP in the superprocess to be composed without loss into a global approximation. We use this framework to obtain approximately optimal solutions for several variants of bandit superprocesses for both maximization and minimization. We obtain new approximations for combinatorial versions of the previously studied Pandora's Box with Optional Inspection and Pandora's Box with Partial Inspection; as well as approximation algorithms for a new problem that we call the Weighing Scale problem.
翻译:我们研究一类随机变量优化问题,其中算法可通过一系列代价步骤获取任意变量的取值信息;我们将此信息获取过程建模为马尔可夫决策过程(MDP)。算法的目标是最小化解的代价与信息获取代价之和,或等价地最大化解的价值与信息获取代价之差。此类赌博机超过程先前已有研究,但仅适用于相当受限的特殊情形。我们建立了适用于任意具有拟阵(及某些更广义)可行性约束过程的赌博机超过程近似优化框架。该框架通过创新的代价摊销方法确立最优代价的界;随后将此界与局部近似概念相结合,使得超过程中各分量MDP的近似解可无损组合为全局近似解。利用此框架,我们为多种赌博机超过程变体(包括最大化与最小化问题)获得了近似最优解。我们针对先前研究的"带可选检查的潘多拉魔盒"与"带部分检查的潘多拉魔盒"的组合版本提出了新的近似算法;同时为称为"天平称量问题"的新问题设计了近似算法。