Quantile regression is increasingly encountered in modern big data applications due to its robustness and flexibility. We consider the scenario of learning the conditional quantiles of a specific target population when the available data may go beyond the target and be supplemented from other sources that possibly share similarities with the target. A crucial question is how to properly distinguish and utilize useful information from other sources to improve the quantile estimation and inference at the target. We develop transfer learning methods for high-dimensional quantile regression by detecting informative sources whose models are similar to the target and utilizing them to improve the target model. We show that under reasonable conditions, the detection of the informative sources based on sample splitting is consistent. Compared to the naive estimator with only the target data, the transfer learning estimator achieves a much lower error rate as a function of the sample sizes, the signal-to-noise ratios, and the similarity measures among the target and the source models. Extensive simulation studies demonstrate the superiority of our proposed approach. We apply our methods to tackle the problem of detecting hard-landing risk for flight safety and show the benefits and insights gained from transfer learning of three different types of airplanes: Boeing 737, Airbus A320, and Airbus A380.
翻译:分位数回归因其稳健性和灵活性,在现代大数据应用中日益常见。我们考虑特定目标总体的条件分位数学习场景,此时可用数据可能超出目标范围,并可从其他可能与目标共享相似性的数据源中补充。一个关键问题是如何正确区分并利用其他来源的有用信息,以改进目标总体的分位数估计与推断。我们开发了面向高维分位数回归的迁移学习方法,通过检测与目标模型相似的知情数据源,并利用这些数据源来改进目标模型。研究表明,在合理条件下,基于样本分裂的知情数据源检测具有一致性。相较于仅使用目标数据的朴素估计量,迁移学习估计量在样本量、信噪比以及目标与源模型相似性度量指标上实现了显著更低的误差率。大量仿真实验证实了所提方法的优越性。我们将该方法应用于飞行安全中的硬着陆风险检测问题,并展示了通过对波音737、空客A320和空客A380三种不同机型进行迁移学习所获得的收益与洞见。