There has been remarkable progress over the past decade in establishing finite-sample, non-asymptotic bounds on recovering unknown system parameters from observed system behavior. Surprisingly, however, we show that the current state-of-the-art bounds do not accurately capture the statistical complexity of system identification, even in the most fundamental setting of estimating a discrete-time linear dynamical system (LDS) via ordinary least-squares regression (OLS). Specifically, we utilize asymptotic normality to identify classes of problem instances for which current bounds overstate the squared parameter error, in both spectral and Frobenius norm, by a factor of the state-dimension of the system. Informed by this discrepancy, we then sharpen the OLS parameter error bounds via a novel second-order decomposition of the parameter error, where crucially the lower-order term is a matrix-valued martingale that we show correctly captures the CLT scaling. From our analysis we obtain finite-sample bounds for both (i) stable systems and (ii) the many-trajectories setting that match the instance-specific optimal rates up to constant factors in Frobenius norm, and polylogarithmic state-dimension factors in spectral norm.
翻译:过去十年中,在从观测系统行为恢复未知系统参数方面,建立有限样本非渐近界的研究取得了显著进展。然而,令人惊讶的是,我们证明当前最优的界即使是在最基础的设置——通过普通最小二乘回归(OLS)估计离散时间线性动力系统(LDS)——也无法准确刻画系统辨识的统计复杂度。具体而言,我们利用渐近正态性识别出一类问题实例,当前界在这些实例中在谱范数和Frobenius范数下将平方参数误差高估了系统状态维度的倍数。基于这一差异,我们通过一种新颖的参数误差二阶分解来优化OLS参数误差界,其中关键的低阶项是一个矩阵值鞅,我们证明其能够正确捕获CLT标度。通过我们的分析,我们获得了以下两种情形的有限样本界:(i)稳定系统,(ii)多轨迹设置,这些界在Frobenius范数下匹配特定实例的最优速率(仅差常数因子),在谱范数下匹配最优速率(仅差状态维度的多对数因子)。