Dynamic Feature Engineering and model selection methods for temporal tabular datasets with regime changes

The application of deep learning algorithms to temporal panel datasets is difficult due to heavy non-stationarities which can lead to over-fitted models that under-perform under regime changes. In this work we propose a new machine learning pipeline for ranking predictions on temporal panel datasets which is robust under regime changes of data. Different machine-learning models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks with and without simple feature engineering are evaluated in the pipeline with different settings. We find that GBDT models with dropout display high performance, robustness and generalisability with relatively low complexity and reduced computational cost. We then show that online learning techniques can be used in post-prediction processing to enhance the results. In particular, dynamic feature neutralisation, an efficient procedure that requires no retraining of models and can be applied post-prediction to any machine learning model, improves robustness by reducing drawdown in regime changes. Furthermore, we demonstrate that the creation of model ensembles through dynamic model selection based on recent model performance leads to improved performance over baseline by improving the Sharpe and Calmar ratios of out-of-sample prediction performances. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility of results.

翻译：将深度学习算法应用于时间面板数据集时，由于严重的非平稳性可能导致模型过拟合且在机制变化时性能下降，因此极具挑战性。本研究提出了一种新的机器学习流程，用于对时间面板数据集进行稳健的排序预测，该流程对数据的机制变化具有鲁棒性。我们评估了不同设置下的多种机器学习模型，包括梯度提升决策树（GBDTs）以及带/不带简单特征工程的神经网络。研究发现，采用dropout的GBDT模型在复杂度较低且计算成本降低的情况下，表现出高性能、鲁棒性和泛化能力。进一步表明，在线学习技术可用于后预测处理以增强结果。特别地，动态特征中性化作为一种无需重新训练模型且可应用于任何机器学习模型后预测的高效方法，通过减少机制变化中的回撤提高了鲁棒性。此外，我们证明基于近期模型性能的动态模型选择构建模型集成，通过提高样本外预测性能的夏普比率和卡玛比率，优于基准方法。我们还通过不同数据划分和随机种子评估了流程的鲁棒性，结果具有良好的可重复性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日