Flexible Model Aggregation for Quantile Regression

Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for this problem over many years of research in statistics, machine learning, and related fields. Rather than proposing yet another (new) algorithm for quantile regression we adopt a meta viewpoint: we investigate methods for aggregating any number of conditional quantile models, in order to improve accuracy and robustness. We consider weighted ensembles where weights may vary over not only individual models, but also over quantile levels, and feature values. All of the models we consider in this paper can be fit using modern deep learning toolkits, and hence are widely accessible (from an implementation point of view) and scalable. To improve the accuracy of the predicted quantiles (or equivalently, prediction intervals), we develop tools for ensuring that quantiles remain monotonically ordered, and apply conformal calibration methods. These can be used without any modification of the original library of base models. We also review some basic theory surrounding quantile aggregation and related scoring rules, and contribute a few new results to this literature (for example, the fact that post sorting or post isotonic regression can only improve the weighted interval score). Finally, we provide an extensive suite of empirical comparisons across 34 data sets from two different benchmark repositories.

翻译：分位数回归是统计学习中的一个基本问题，其动机源于量化预测不确定性或对多样化群体进行建模（避免过度简化）的需求。例如，流行病学预测、成本估算和收入预测均受益于能够准确量化可能值的范围。因此，经过统计学、机器学习及相关领域的多年研究，已针对该问题开发出众多模型。本文并非提出又一个（新的）分位数回归算法，而是采用元视角：研究聚合任意数量的条件分位数模型的方法，以提升精度和鲁棒性。我们考虑加权集成方法，其中权重不仅可以在单个模型间变化，还可以随分位数水平和特征值变化。本文考虑的所有模型均可使用现代深度学习工具包进行拟合，因此（从实现角度）具有广泛可及性和可扩展性。为提高预测分位数（等价于预测区间）的精度，我们开发了确保分位数单调有序的工具，并应用保形校准方法。这些方法无需修改原始基础模型库即可使用。我们还回顾了围绕分位数聚合及相关评分规则的基础理论，并为该文献贡献了一些新结果（例如，排序后或保序回归后处理仅能改善加权区间评分）。最后，我们在来自两个不同基准库的34个数据集上进行了广泛的实证比较。

相关内容

统计学

关注 46

统计学(Statistics)是研究收集、分析、解读、展示及组织(collection, analysis, interpretation, presentation and organization)数据的学科，通过量化地研究随机性，从而理解数据的产生机制，并进行判别、预测、优化、决策。统计学理论和方法是很多现代科学分支的支柱，其广泛的应用深刻地影响现代生活，具有代表性的应用领域包括：生物/医学(生物统计学，基因统计学，生物信息学，制药学等)
社会学/环境学(社会统计学，心理学，人口学，空间统计学，环境统计学等)
工业工程学(质量控制，可靠性分析等)
经济学/金融学(精算学，金融统计学等)
工程学/计算机科学(统计学习，数据挖掘，信号/图像采样/处理等)
基础科学(统计物理学，统计化学等)

【ICML2023】通过离散扩散建模实现高效和度引导的图生成

专知会员服务

21+阅读 · 2023年5月17日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

41+阅读 · 2020年9月21日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日