The shapes of functions provide highly interpretable summaries of their trajectories. This article develops a novel transfer learning methodology to tackle the challenge of data scarcity in functional linear models. The methodology incorporates samples from the target model (target domain) alongside those from auxiliary models (source domains), transferring knowledge of coefficient shape from the source domains to the target domain. This shape-based transfer learning framework enhances robustness and generalizability: by being invariant to covariate scaling and signal strength, it ensures reliable knowledge transfer even when data from different sources differ in magnitude, and by formalizing the notion of coefficient shape homogeneity, it extends beyond traditional coefficient-equality assumptions to incorporate information from a broader range of source domains. We rigorously analyze the convergence rates of the proposed estimator and examine the minimax optimality. Our findings show that the degree of improvement depends not only on the similarity of coefficient shapes between the target and source domains, but also on coefficient magnitudes and the spectral decay rates of the functional covariates covariance operators. To address situations where only a subset of auxiliary models is informative for the target model, we further develop a data-driven procedure for identifying such informative sources. The effectiveness of the proposed methodology is demonstrated through comprehensive simulation studies and an application to occupation time analysis using physical activity data from the U.S. National Health and Nutrition Examination Survey.
翻译:函数形状为轨迹提供了高度可解释的概要。本文提出了一种新颖的迁移学习方法,以解决函数线性模型中数据稀缺的挑战。该方法整合了目标模型(目标域)与辅助模型(源域)的样本,将系数形状的知识从源域迁移至目标域。这种基于形状的迁移学习框架增强了鲁棒性和泛化能力:通过保持对协变量缩放和信号强度的不变性,即使在来自不同源的数据量级存在差异时也能确保可靠的知识迁移;同时,通过形式化系数形状同质性的概念,该方法超越了传统的系数相等假设,能够纳入更广泛源域的信息。我们严格分析了所提出估计量的收敛速率,并检验了其极小极大最优性。研究结果表明,改进程度不仅取决于目标域与源域之间系数形状的相似性,还受系数大小及函数协变量协方差算子谱衰减速率的影响。针对仅部分辅助模型对目标模型具有信息价值的情况,我们进一步开发了一种数据驱动程序来识别此类信息源。通过综合仿真研究以及利用美国国家健康与营养调查的体力活动数据进行的占用时间分析应用,验证了所提出方法的有效性。