Data preprocessing is a crucial part of any machine learning pipeline, and it can have a significant impact on both performance and training efficiency. This is especially evident when using deep neural networks for time series prediction and classification: real-world time series data often exhibit irregularities such as multi-modality, skewness and outliers, and the model performance can degrade rapidly if these characteristics are not adequately addressed. In this work, we propose the EDAIN (Extended Deep Adaptive Input Normalization) layer, a novel adaptive neural layer that learns how to appropriately normalize irregular time series data for a given task in an end-to-end fashion, instead of using a fixed normalization scheme. This is achieved by optimizing its unknown parameters simultaneously with the deep neural network using back-propagation. Our experiments, conducted using synthetic data, a credit default prediction dataset, and a large-scale limit order book benchmark dataset, demonstrate the superior performance of the EDAIN layer when compared to conventional normalization methods and existing adaptive time series preprocessing layers.
翻译:数据预处理是任何机器学习流程中的关键环节,对模型性能与训练效率均有重要影响。当使用深度神经网络进行时间序列预测与分类时,这一特性尤为显著:现实世界的时间序列数据常呈现多模态性、偏态分布及异常值等不规则特征,若无法妥善处理这些特性,模型性能将快速恶化。本文提出EDAIN(扩展深度自适应输入归一化)层——一种新型自适应神经层,其能以端到端方式学习如何针对特定任务合理归一化不规则时间序列数据,而非采用固定归一化方案。通过反向传播算法同步优化该层的未知参数与深度神经网络参数,即可实现上述目标。基于合成数据、信用违约预测数据集及大规模限价订单簿基准数据集的实验表明,相较于传统归一化方法与现有自适应时间序列预处理层,EDAIN层展现出更优性能。