Extraction of informative statistical features in the problem of forecasting time series generated by It{ô}-type processes

In this paper, we consider the problem of extraction of most informative features from time series that are regarded as observed values of stochastic processes satisfying the It{ô} stochastic differential equations with unknown random drift and diffusion coefficients. We do not attract any additional information and use only the information contained in the time series as it is. Therefore, as additional features, we use the parameters of statistically adjusted mixture-type models of the observed regularities of the behavior of the time series. Several algorithms of construction of these parameters are discussed. These algorithms are based on statistical reconstruction of the coefficients which, in turn, is based on statistical separation of normal mixtures. We obtain two types of parameters by the techniques of the uniform and non-uniform statistical reconstruction of the coefficients of the underlying It{ô} process. The reconstructed coefficients obtained by uniform techniques do not depend on the current value of the process, while the non-uniform techniques reconstruct the coefficients with the account of their dependence on the value of the process. Actually, the non-uniform techniques used in this paper represent a stochastic analog of the Taylor expansion for the time series. The efficiency of the obtained additional features is compared by using them in the autoregressive algorithms of prediction of time series. In order to obtain pure conclusion that is not affected by unwanted factors, say, related to a special choice of the architecture of the neural network prediction methods, we used only simple autoregressive algorithms. We show that the use of additional statistical features improves the prediction.

翻译：本文考虑从时间序列中提取最具信息性特征的问题，该时间序列被视为满足具有未知随机漂移和扩散系数的It{ô}随机微分方程的随机过程的观测值。我们不引入任何额外信息，仅使用时间序列本身包含的信息。因此，我们使用时间序列行为观测规律统计调整的混合模型参数作为附加特征。本文讨论了构建这些参数的几种算法。这些算法基于系数统计重构，而系数统计重构又依赖于正态混合的统计分离。通过均匀和非均匀统计重构底层It{ô}过程系数的技术，我们获得了两类参数。通过均匀技术获得的重构系数不依赖于过程的当前值，而非均匀技术则考虑系数对过程值的依赖性进行重构。实际上，本文使用的非均匀技术代表了时间序列泰勒展开的随机模拟。通过将这些附加特征应用于时间序列的自回归预测算法，比较了其效率。为排除神经网络预测方法特定架构选择等非期望因素的干扰以获取纯粹结论，我们仅使用简单自回归算法。结果表明，使用附加统计特征能够提升预测效果。