Flexible modeling of the entire distribution as a function of covariates is an important generalization of mean-based regression that has seen growing interest over the past decades in both the statistics and machine learning literature. This review outlines selected state-of-the-art statistical approaches to distributional regression, complemented with alternatives from machine learning. Topics covered include the similarities and differences between these approaches, extensions, properties and limitations, estimation procedures, and the availability of software. In view of the increasing complexity and availability of large-scale data, this review also discusses the scalability of traditional estimation methods, current trends, and open challenges. Illustrations are provided using data on childhood malnutrition in Nigeria and Australian electricity prices.
翻译:分布回归作为一种以协变量为条件对整体分布进行灵活建模的方法,是对传统均值回归的重要推广。过去数十年间,统计学与机器学习领域的学者对此方法的关注度持续增长。本综述系统梳理了分布回归领域具有代表性的统计方法,并辅以机器学习领域的替代方案。涵盖内容涉及不同方法之间的异同点比较、扩展方向、性质特征与局限性、参数估计流程以及软件实现工具的可用性。鉴于大规模数据日益复杂的特性与可获取性,本文还探讨了传统估计方法的可扩展性、当前研究趋势与尚未解决的挑战。通过尼日利亚儿童营养不良数据与澳大利亚电价数据进行了实例验证。