Symbolic Regression (SR) is a well-established framework for generating interpretable or white-box predictive models. Although SR has been successfully applied to create interpretable estimates of the average of the outcome, it is currently not well understood how it can be used to estimate the relationship between variables at other points in the distribution of the target variable. Such estimates of e.g. the median or an extreme value provide a fuller picture of how predictive variables affect the outcome and are necessary in high-stakes, safety-critical application domains. This study introduces Symbolic Quantile Regression (SQR), an approach to predict conditional quantiles with SR. In an extensive evaluation, we find that SQR outperforms transparent models and performs comparably to a strong black-box baseline without compromising transparency. We also show how SQR can be used to explain differences in the target distribution by comparing models that predict extreme and central outcomes in an airline fuel usage case study. We conclude that SQR is suitable for predicting conditional quantiles and understanding interesting feature influences at varying quantiles.
翻译:符号回归(SR)是生成可解释或白箱预测模型的成熟框架。尽管SR已成功应用于创建结果平均值的可解释估计,但目前尚不清楚如何利用它来估计目标变量分布中其他点处的变量关系。例如中位数或极值等估计能更全面揭示预测变量对结果的影响,在高风险、安全关键的应用领域中不可或缺。本研究提出符号分位数回归(SQR),一种利用SR预测条件分位数的方法。在广泛评估中,我们发现SQR优于透明模型,且在不牺牲透明性的前提下性能与强黑箱基线相当。我们还通过航空公司燃油使用案例研究,展示如何利用SQR通过比较预测极端与中心结果的模型来解释目标分布差异。结论表明,SQR适用于预测条件分位数,并能理解不同分位数下有趣的特征影响。