Conservative Predictions on Noisy Financial Data

Price movements in financial markets are well known to be very noisy. As a result, even if there are, on occasion, exploitable patterns that could be picked up by machine-learning algorithms, these are obscured by feature and label noise rendering the predictions less useful, and risky in practice. Traditional rule-learning techniques developed for noisy data, such as CN2, would seek only high precision rules and refrain from making predictions where their antecedents did not apply. We apply a similar approach, where a model abstains from making a prediction on data points that it is uncertain on. During training, a cascade of such models are learned in sequence, similar to rule lists, with each model being trained only on data on which the previous model(s) were uncertain. Similar pruning of data takes place at test-time, with (higher accuracy) predictions being made albeit only on a fraction (support) of test-time data. In a financial prediction setting, such an approach allows decisions to be taken only when the ensemble model is confident, thereby reducing risk. We present results using traditional MLPs as well as differentiable decision trees, on synthetic data as well as real financial market data, to predict fixed-term returns using commonly used features. We submit that our approach is likely to result in better overall returns at a lower level of risk. In this context we introduce an utility metric to measure the average gain per trade, as well as the return adjusted for downside risk, both of which are improved significantly by our approach.

翻译：众所周知，金融市场中的价格波动具有高度噪声特性。因此，即便偶尔存在机器学习算法可捕捉的可利用模式，这些模式也会因特征与标签噪声而变得模糊不清，导致预测实用性降低，并在实际应用中存在风险。传统针对噪声数据开发的规则学习技术（如CN2）仅追求高精度规则，并避免在规则前件不适用时进行预测。我们采用了类似方法：当模型对数据点存在不确定性时，主动放弃预测。在训练过程中，按序列依次学习级联模型（类似于规则列表），每个模型仅基于先前模型未能确定的那些数据进行训练。测试阶段同样对数据进行类似剪枝处理——虽仅能对部分（支持区域）测试数据做出预测，但预测精度更高。在金融预测场景中，这种方法允许仅在集成模型具有置信度时做出决策，从而降低风险。我们使用传统MLP与可微决策树，基于合成数据与真实金融市场数据，通过常用特征预测定期收益。实验表明，我们的方法有望在更低风险水平下获得更优总体收益。为此，我们引入了衡量每笔交易平均收益的效用指标，以及经下行风险调整后的收益指标——两种指标均通过我们的方法得到了显著提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日