Modern data-driven applications increasingly involve learning from multiple heterogeneous sources, where a target dataset is limited but related information is available across domains. Naively combining these sources can degrade performance when relevance varies or spurious signals are present, posing a fundamental challenge for trustworthy cross-domain learning. We propose Projection Transfer Learning (ProjectionTL), a unified framework that integrates hierarchical Bayesian modeling with adaptive projection for selective knowledge transfer. The key idea is to decouple transfer at two levels: first, we construct a source-guided hierarchical prior that aggregates information across sources using data-driven weights, capturing global alignment between each source and the target; second, we refine this borrowing through a posterior-projection step that operates at the feature level, selectively retaining coordinates that exhibit local agreement with the target signal. This two-stage design enables the method to simultaneously perform source selection and feature selection, thereby mitigating negative transfer while preserving interpretability. ProjectionTL provides a principled approach to integrating heterogeneous data across domains, bridging statistical modeling and modern machine learning paradigms for robust and interpretable transfer. Through simulations and real-world biomedical applications, we demonstrate improved accuracy, stability, and interpretability compared to existing methods. Our framework offers a scalable and generalizable strategy for trustworthy cross-domain learning in high-dimensional settings.
翻译:现代数据驱动应用日益涉及从多个异构来源进行学习,其中目标数据集有限但跨领域存在相关信息。当相关性存在差异或存在虚假信号时,简单合并这些来源可能降低性能,给可信的跨领域学习带来根本性挑战。我们提出投影迁移学习(ProjectionTL),这是一个统一框架,将层次贝叶斯建模与自适应投影相结合,用于选择性知识迁移。其关键思想在两个层次上解耦迁移过程:首先,构建源引导的层次先验,通过数据驱动权重聚合来自各来源的信息,捕捉每个来源与目标之间的全局对齐;其次,通过特征层面的后验投影步骤进行精细化借用,选择性保留与目标信号存在局部一致性的特征坐标。这种两阶段设计使该方法能够同时进行源选择和特征选择,从而在保持可解释性的同时缓解负迁移。ProjectionTL提供了一种跨领域整合异构数据的原理性方法,桥接了统计建模与现代机器学习范式,实现稳健且可解释的迁移。通过模拟实验和实际生物医学应用,我们证明了该方法相比现有方法具有更优的准确性、稳定性和可解释性。该框架为高维场景下可信跨领域学习提供了一种可扩展且通用的策略。