We introduce a framework for the data-driven discovery of stochastic differential equations (SDEs) that unifies, for the first time, the weak-form integration-by-parts approach of Weak SINDy with the stochastic system identification goal of stochastic SINDy. The central novelty is the adoption of spatial Gaussian test functions $K_j(x)=\exp(-|x-x_j|^2/2h^2)$ in place of temporal test functions. Because the kernel weight $K_j(X_{t_n})$ is $\mathcal{F}_{t_n}$-measurable and the Brownian innovation $ξ_n$ is independent of $\mathcal{F}_{t_n}$, every noise term in the projected response has zero conditional mean given the current state -- a property that guarantees unbiasedness in expectation and prevents the structural regression bias that afflicts temporal test functions in the stochastic setting. This design choice converts the SDE identification problem into two sparse linear systems -- one for the drift $b(x)$ and one for the diffusion tensor $a(x)$ -- that share a single design matrix and are solved jointly via $\ell_1$-regularised regression with grouped cross-validation. A two-step bias-correction procedure handles state-dependent diffusion. Validated on the Ornstein--Uhlenbeck process, the double-well Langevin system, and a multiplicative diffusion process, the method recovers all active polynomial generators with coefficient errors below 4\%, stationary-density total-variation distances below 0.01, and autocorrelation functions that faithfully reproduce true relaxation timescales across all three benchmarks.
翻译:我们提出一个框架,用于数据驱动的随机微分方程发现,该框架首次将弱SINDy的弱形式分部积分方法与随机SINDy的随机系统辨识目标统一起来。核心创新在于采用空间高斯测试函数$K_j(x)=\exp(-|x-x_j|^2/2h^2)$替代时间测试函数。由于核权重$K_j(X_{t_n})$是$\mathcal{F}_{t_n}$可测的,而布朗创新$\xi_n$与$\mathcal{F}_{t_n}$独立,投影响应中的每个噪声项在给定当前状态的条件期望为零——这一性质保证了期望上的无偏性,并避免了在随机设置中时间测试函数所遭受的结构性回归偏差。该设计选择将SDE辨识问题转化为两个稀疏线性系统——一个针对漂移项$b(x)$,另一个针对扩散张量$a(x)$——这两个系统共享单一设计矩阵,并通过结合分组交叉验证的$\ell_1$正则化回归联合求解。针对状态依赖扩散,采用两步偏差校正程序。在Ornstein-Uhlenbeck过程、双势阱Langevin系统以及乘性扩散过程上的验证表明,该方法能够恢复所有活跃的多项式生成元,系数误差低于4%,平稳密度全变差距离低于0.01,且自相关函数在所有三个基准测试中忠实再现了真实弛豫时间尺度。