Learning from Censored and Dependent Data: The case of Linear Dynamics

Observations from dynamical systems often exhibit irregularities, such as censoring, where values are recorded only if they fall within a certain range. Censoring is ubiquitous in practice, due to saturating sensors, limit-of-detection effects, and image-frame effects. In light of recent developments on learning linear dynamical systems (LDSs), and on censored statistics with independent data, we revisit the decades-old problem of learning an LDS, from censored observations (Lee and Maddala (1985); Zeger and Brookmeyer (1986)). Here, the learner observes the state $x_t \in \mathbb{R}^d$ if and only if $x_t$ belongs to some set $S_t \subseteq \mathbb{R}^d$. We develop the first computationally and statistically efficient algorithm for learning the system, assuming only oracle access to the sets $S_t$. Our algorithm, Stochastic Online Newton with Switching Gradients, is a novel second-order method that builds on the Online Newton Step (ONS) of Hazan et al. (2007). Our Switching-Gradient scheme does not always use (stochastic) gradients of the function we want to optimize, which we call "censor-aware" function. Instead, in each iteration, it performs a simple test to decide whether to use the censor-aware, or another "censor-oblivious" function, for getting a stochastic gradient. In our analysis, we consider a "generic" Online Newton method, which uses arbitrary vectors instead of gradients, and we prove an error-bound for it. This can be used to appropriately design these vectors, leading to our Switching-Gradient scheme. This framework significantly deviates from the recent long line of works on censored statistics (e.g., Daskalakis et al. (2018); Kontonis et al. (2019); Daskalakis et al. (2019)), which apply Stochastic Gradient Descent (SGD), and their analysis reduces to establishing conditions for off-the-shelf SGD-bounds.

翻译：动力系统观测值常呈现不规则性，例如删失现象——仅当数值落入特定区间时才被记录。由于传感器饱和、检测限效应及图像帧效应，删失在实际应用中普遍存在。基于线性动力学系统（LDS）学习的最新进展及独立数据删失统计学的发展，我们重新审视了经典难题：如何从删失观测值中学习LDS（Lee和Maddala, 1985; Zeger和Brookmeyer, 1986）。在此框架下，学习器仅当状态变量$x_t \in \mathbb{R}^d$属于集合$S_t \subseteq \mathbb{R}^d$时才能观测到该状态。我们提出首个兼具计算效率与统计效率的算法，仅需通过预言机访问集合$S_t$即可完成系统学习。该算法名为"带切换梯度的随机在线牛顿法"，其创新之处在于构建了基于Hazan等人(2007)在线牛顿步（ONS）的二阶方法。我们的切换梯度方案并非始终使用目标函数（称为"删失感知"函数）的（随机）梯度，而是在每次迭代中通过简单测试决定采用删失感知函数或另一种"删失忽略"函数来获取随机梯度。在理论分析中，我们首先论证了使用任意向量替代梯度的"泛化"在线牛顿法误差界，进而据此设计向量，最终导出切换梯度方案。本框架显著区别于近期删失统计学研究脉络（如Daskalakis等人（2018）；Kontonis等人（2019）；Daskalakis等人（2019）），后者采用随机梯度下降法（SGD）并通过建立条件利用现成SGD界进行分析。