Markov Chains with Rewinding

Motivated by techniques developed in recent progress on lower bounds for sublinear time algorithms (Behnezhad, Roghani and Rubinstein, STOC 2023, FOCS 2023, and STOC 2024) we introduce and study a new class of randomized algorithmic processes that we call Markov Chains with Rewinding. In this setting, an algorithm interacts with a (partially observable) Markovian random evolution by strategically rewinding the Markov chain to previous states. Depending on the application, this may lead the evolution to desired states faster, or allow the agent to efficiently learn or test properties of the underlying Markov chain that may be infeasible or inefficient with passive observation. We study the task of identifying the initial state in a given partially observable Markov chain. Analysis of this question in specific Markov chains is the central ingredient in the above cited works and we aim to systematize the analysis in our work. Our first result is that any pair of states distinguishable with any rewinding strategy can also be distinguished with a non-adaptive rewinding strategy (one whose rewinding choices are determined before observing any outcomes of the chain). Therefore, while rewinding strategies can be shown to be strictly more powerful than passive strategies (those that do not rewind back to previous states), adaptivity does not give additional power to a rewinding strategy in the absence of efficiency considerations. The difference becomes apparent however when we introduce a natural efficiency measure, namely the query complexity (i.e., the number of observations they need to identify distinguishable states). Our second main contribution is to quantify this efficiency gap. We present a non-adaptive rewinding strategy whose query complexity is within a polynomial of that of the optimal (adaptive) strategy, and show that such a polynomial loss is necessary in general.

翻译：受近期亚线性时间算法下界研究进展中发展出的技术（Behnezhad、Roghani 和 Rubinstein，STOC 2023、FOCS 2023 及 STOC 2024）启发，我们引入并研究了一类新的随机算法过程，称之为“带回溯的马尔可夫链”。在此框架下，算法通过策略性地将马尔可夫链回溯至先前状态，与一个（部分可观测的）马尔可夫随机演化过程进行交互。根据具体应用场景，这可能使演化更快地到达期望状态，或使智能体能够高效学习或测试底层马尔可夫链的性质，而这些性质通过被动观测可能无法实现或效率低下。我们研究了在给定部分可观测马尔可夫链中识别初始状态的任务。对特定马尔可夫链中该问题的分析是上述引用工作的核心要素，我们的研究旨在系统化此类分析。我们的第一个结论是：任何可通过任意回溯策略区分的状态对，同样可通过非自适应回溯策略（其回溯决策在观测链的任何结果之前确定）进行区分。因此，虽然回溯策略被证明严格强于被动策略（即不回溯至先前状态的策略），但在不考虑效率因素时，自适应性并未赋予回溯策略额外的能力。然而，当我们引入自然的效率度量——即查询复杂度（识别可区分状态所需的观测次数）时，差异便显现出来。我们的第二个主要贡献在于量化这一效率差距。我们提出了一种非自适应回溯策略，其查询复杂度在最优（自适应）策略的多项式倍范围内，并证明这种多项式损失在一般情况下是不可避免的。