We construct a family of Markov decision processes for which the policy iteration algorithm needs an exponential number of improving switches with Dantzig's rule, with Bland's rule, and with the Largest Increase pivot rule. This immediately translates to a family of linear programs for which the simplex algorithm needs an exponential number of pivot steps with the same three pivot rules. Our results yield a unified construction that simultaneously reproduces well-known lower bounds for these classical pivot rules, and we are able to infer that any (deterministic or randomized) combination of them cannot avoid an exponential worst-case behavior. Regarding the policy iteration algorithm, pivot rules typically switch multiple edges simultaneously and our lower bound for Dantzig's rule and the Largest Increase rule, which perform only single switches, seem novel. Regarding the simplex algorithm, the individual lower bounds were previously obtained separately via deformed hypercube constructions. In contrast to previous bounds for the simplex algorithm via Markov decision processes, our rigorous analysis is reasonably concise.
翻译:我们构造了一族马尔可夫决策过程,使得策略迭代算法在使用丹齐格规则、布兰德规则和最大增量规则时,均需要指数次改进切换。这直接转化为一族线性规划问题,使得单纯形法在这三种转轴规则下需要指数次转轴步骤。我们的结果提供了一个统一构造,同时重现了这些经典转轴规则已知的下界,并推断出任何(确定性或随机性)组合均无法避免指数级最坏情形行为。对于策略迭代算法,转轴规则通常同时切换多条边,而我们对仅执行单次切换的丹齐格规则和最大增量规则的下界似乎是新颖的。对于单纯形法,此前各下界是通过变形超立方体构造分别获得的。与之前通过马尔可夫决策过程获得的单纯形法下界相比,我们的严格分析相当简洁。