Quantile regression is increasingly encountered in modern big data applications due to its robustness and flexibility. We consider the scenario of learning the conditional quantiles of a specific target population when the available data may go beyond the target and be supplemented from other sources that possibly share similarities with the target. A crucial question is how to properly distinguish and utilize useful information from other sources to improve the quantile estimation and inference at the target. We develop transfer learning methods for high-dimensional quantile regression by detecting informative sources whose models are similar to the target and utilizing them to improve the target model. We show that under reasonable conditions, the detection of the informative sources based on sample splitting is consistent. Compared to the naive estimator with only the target data, the transfer learning estimator achieves a much lower error rate as a function of the sample sizes, the signal-to-noise ratios, and the similarity measures among the target and the source models. Extensive simulation studies demonstrate the superiority of our proposed approach. We apply our methods to tackle the problem of detecting hard-landing risk for flight safety and show the benefits and insights gained from transfer learning of three different types of airplanes: Boeing 737, Airbus A320, and Airbus A380.
翻译:分位数回归因其稳健性和灵活性,在现代大数据应用中日益常见。本文研究了当可用数据可能超出目标范围、可从其他可能与目标相似的来源补充数据时,学习特定目标群体的条件分位数的问题。关键问题是如何正确区分并利用其他来源的有用信息,以改进目标群体的分位数估计与推断。我们针对高维分位数回归开发了迁移学习方法,通过检测与目标模型相似的知情源,并利用这些知情源来优化目标模型。理论证明,在合理条件下,基于样本分裂的知情源检测具有一致性。与仅使用目标数据的朴素估计相比,迁移学习估计器在样本量、信噪比以及目标与源模型相似性度量方面实现了更低的误差率。大量模拟研究验证了所提方法的优越性。我们将该方法应用于飞行安全中的硬着陆风险检测问题,并展示了从波音737、空客A320和空客A380三种不同类型飞机的迁移学习中获得的收益与洞见。