In this paper, we analyze the regret incurred by a computationally efficient exploration strategy, known as naive exploration, for controlling unknown partially observable systems within the Linear Quadratic Gaussian (LQG) framework. We introduce a two-phase control algorithm called LQG-NAIVE, which involves an initial phase of injecting Gaussian input signals to obtain a system model, followed by a second phase of an interplay between naive exploration and control in an episodic fashion. We show that LQG-NAIVE achieves a regret growth rate of $\tilde{\mathcal{O}}(\sqrt{T})$, i.e., $\mathcal{O}(\sqrt{T})$ up to logarithmic factors after $T$ time steps, and we validate its performance through numerical simulations. Additionally, we propose LQG-IF2E, which extends the exploration signal to a `closed-loop' setting by incorporating the Fisher Information Matrix (FIM). We provide compelling numerical evidence of the competitive performance of LQG-IF2E compared to LQG-NAIVE.
翻译:本文分析了在未知部分可观测系统的线性二次型高斯(LQG)框架中,一种计算高效的探索策略(称为朴素探索)所产生的遗憾。我们提出了一种两阶段控制算法LQG-NAIVE,该算法包括初始阶段注入高斯输入信号以获取系统模型,随后第二阶段在回合制方式下交替进行朴素探索与控制。我们证明LQG-NAIVE在$T$个时间步后实现了$\tilde{\mathcal{O}}(\sqrt{T})$的遗憾增长率(即对数因子下的$\mathcal{O}(\sqrt{T})$),并通过数值仿真验证其性能。此外,我们提出了LQG-IF2E,该方法通过引入Fisher信息矩阵(FIM)将探索信号扩展至“闭环”设置。我们提供了令人信服的数值证据,表明LQG-IF2E与LQG-NAIVE相比具有竞争性的性能。