Price movements in financial markets are well known to be very noisy. As a result, even if there are, on occasion, exploitable patterns that could be picked up by machine-learning algorithms, these are obscured by feature and label noise rendering the predictions less useful, and risky in practice. Traditional rule-learning techniques developed for noisy data, such as CN2, would seek only high precision rules and refrain from making predictions where their antecedents did not apply. We apply a similar approach, where a model abstains from making a prediction on data points that it is uncertain on. During training, a cascade of such models are learned in sequence, similar to rule lists, with each model being trained only on data on which the previous model(s) were uncertain. Similar pruning of data takes place at test-time, with (higher accuracy) predictions being made albeit only on a fraction (support) of test-time data. In a financial prediction setting, such an approach allows decisions to be taken only when the ensemble model is confident, thereby reducing risk. We present results using traditional MLPs as well as differentiable decision trees, on synthetic data as well as real financial market data, to predict fixed-term returns using commonly used features. We submit that our approach is likely to result in better overall returns at a lower level of risk. In this context we introduce an utility metric to measure the average gain per trade, as well as the return adjusted for downside risk, both of which are improved significantly by our approach.
翻译:众所周知,金融市场中的价格波动具有高度噪声特性。因此,即便偶尔存在机器学习算法可捕捉的可利用模式,这些模式也会因特征与标签噪声而变得模糊不清,导致预测实用性降低,并在实际应用中存在风险。传统针对噪声数据开发的规则学习技术(如CN2)仅追求高精度规则,并避免在规则前件不适用时进行预测。我们采用了类似方法:当模型对数据点存在不确定性时,主动放弃预测。在训练过程中,按序列依次学习级联模型(类似于规则列表),每个模型仅基于先前模型未能确定的那些数据进行训练。测试阶段同样对数据进行类似剪枝处理——虽仅能对部分(支持区域)测试数据做出预测,但预测精度更高。在金融预测场景中,这种方法允许仅在集成模型具有置信度时做出决策,从而降低风险。我们使用传统MLP与可微决策树,基于合成数据与真实金融市场数据,通过常用特征预测定期收益。实验表明,我们的方法有望在更低风险水平下获得更优总体收益。为此,我们引入了衡量每笔交易平均收益的效用指标,以及经下行风险调整后的收益指标——两种指标均通过我们的方法得到了显著提升。