This paper introduces the RUMBoost model, a novel discrete choice modelling approach that combines the interpretability and behavioural robustness of Random Utility Models (RUMs) with the generalisation and predictive ability of deep learning methods. We obtain the full functional form of non-linear utility specifications by replacing each linear parameter in the utility functions of a RUM with an ensemble of gradient boosted regression trees. This enables piece-wise constant utility values to be imputed for all alternatives directly from the data for any possible combination of input variables. We introduce additional constraints on the ensembles to ensure three crucial features of the utility specifications: (i) dependency of the utilities of each alternative on only the attributes of that alternative, (ii) monotonicity of marginal utilities, and (iii) an intrinsically interpretable functional form, where the exact response of the model is known throughout the entire input space. Furthermore, we introduce an optimisation-based smoothing technique that replaces the piece-wise constant utility values of alternative attributes with monotonic piece-wise cubic splines to identify non-linear parameters with defined gradient. We demonstrate the potential of the RUMBoost model compared to various ML and Random Utility benchmark models for revealed preference mode choice data from London. The results highlight the great predictive performance and the direct interpretability of our proposed approach. Furthermore, the smoothed attribute utility functions allow for the calculation of various behavioural indicators and marginal utilities. Finally, we demonstrate the flexibility of our methodology by showing how the RUMBoost model can be extended to complex model specifications, including attribute interactions, correlation within alternative error terms and heterogeneity within the population.
翻译:本文提出RUMBoost模型,这是一种新颖的离散选择建模方法,将随机效用模型的可解释性与行为稳健性,同深度学习方法的泛化能力与预测性能相结合。我们通过用梯度提升回归树集成替代随机效用模型中各线性参数,获得非线性效用规范的完整函数形式。这使得能够直接根据数据为任意输入变量组合所对应的所有备选方案推断分段常数效用值。我们在集成上引入额外约束,以确保效用规范的三个关键特征:(i) 各备选方案的效用仅依赖于该方案的属性;(ii) 边际效用的单调性;(iii) 内在可解释的函数形式,使得模型在整个输入空间内的精确响应已知。此外,我们提出一种基于优化的平滑技术,将备选方案属性的分段常数效用值替换为单调分段三次样条,以识别具有定义梯度的非线性参数。我们基于伦敦的显示偏好出行模式选择数据,展示了RUMBoost模型相较于多种机器学习与随机效用基准模型的潜力。结果凸显了所提方法的卓越预测性能与直接可解释性。此外,平滑后的属性效用函数允许计算各类行为指标与边际效用。最后,我们通过展示RUMBoost模型可扩展至复杂模型规范,包括属性交互、备选方案误差项相关性及总体异质性,证明了该方法的灵活性。