Symbolic Regression (SR) is a well-established framework for generating interpretable or white-box predictive models. Although SR has been successfully applied to create interpretable estimates of the average of the outcome, it is currently not well understood how it can be used to estimate the relationship between variables at other points in the distribution of the target variable. Such estimates of e.g. the median or an extreme value provide a fuller picture of how predictive variables affect the outcome and are necessary in high-stakes, safety-critical application domains. This study introduces Symbolic Quantile Regression (SQR), an approach to predict conditional quantiles with SR. In an extensive evaluation, we find that SQR outperforms transparent models and performs comparably to a strong black-box baseline without compromising transparency. We also show how SQR can be used to explain differences in the target distribution by comparing models that predict extreme and central outcomes in an airline fuel usage case study. We conclude that SQR is suitable for predicting conditional quantiles and understanding interesting feature influences at varying quantiles.
翻译:符号回归(SR)是一个成熟的框架,用于生成可解释或白盒预测模型。尽管符号回归已成功应用于生成结果平均值的可解释估计,但目前尚不清楚如何利用它来估计目标变量分布中其他点处的变量间关系。例如,中位数或极端值的此类估计能够更全面地揭示预测变量如何影响结果,并在高风险、安全关键的应用领域中不可或缺。本研究提出符号分位数回归(SQR),一种利用符号回归预测条件分位数的方法。在广泛评估中,我们发现SQR优于透明模型,且在不牺牲可解释性的情况下与强黑箱基线性能相当。我们还通过一项航空公司燃油使用案例研究,展示了如何利用SQR通过比较预测极端结果与中心结果的模型来解释目标分布差异。我们得出结论:SQR适用于预测条件分位数,并理解不同分位数处有趣的特征影响。