Modern statistical analysis often encounters high dimensional models but with limited sample sizes. This makes the target data based statistical estimation very difficult. Then how to borrow information from another large sized source data for more accurate target model estimation becomes an interesting problem. This leads to the useful idea of transfer learning. Various estimation methods in this regard have been developed recently. In this work, we study transfer learning from a different perspective. Specifically, we consider here the problem of testing for transfer learning sufficiency. By transfer learning sufficiency (denoted as the null hypothesis), we mean that, with the help of the source data, the useful information contained in the feature vectors of the target data can be sufficiently extracted for predicting the interested target response. Therefore, the rejection of the null hypothesis implies that information useful for prediction remains in the feature vectors of the target data and thus calls for further exploration. To this end, we develop a novel testing procedure and a centralized and standardized test statistic, whose asymptotic null distribution is analytically derived. Simulation studies are presented to demonstrate the finite sample performance of the proposed method. A deep learning related real data example is presented for illustration purpose.
翻译:现代统计分析常面临高维模型但样本量有限的问题,这使得基于目标数据的统计估计变得非常困难。因此,如何从另一个大规模源数据中借用信息以实现更精准的目标模型估计,成为一个有趣的研究问题,由此催生了迁移学习这一实用理念。近年来,多种相关的估计方法已被提出。本研究从不同视角探讨迁移学习,具体关注迁移学习充分性的检验问题。这里"迁移学习充分性"(记为原假设)指:借助源数据,能够充分提取目标数据特征向量中包含的有用信息,以预测感兴趣的目标响应变量。因此,拒绝原假设意味着目标数据特征向量中仍存有预测所需信息,需要进一步挖掘。为此,我们开发了一种新型检验流程及中心化标准化检验统计量,并解析推导了其渐近零分布。通过模拟研究验证了所提方法的有限样本性能,并基于深度学习实际案例进行了说明。