Model-assisted and Knowledge-guided Transfer Regression for the Underrepresented Population

Covariate shift and outcome model heterogeneity are two prominent challenges in leveraging external sources to improve risk modeling for underrepresented cohorts in paucity of accurate labels. We consider the transfer learning problem targeting some unlabeled minority sample encountering (i) covariate shift to the labeled source sample collected on a different cohort; and (ii) outcome model heterogeneity with some majority sample informative to the targeted minority model. In this scenario, we develop a novel model-assisted and knowledge-guided transfer learning targeting underrepresented population (MAKEUP) approach for high-dimensional regression models. Our MAKEUP approach includes a model-assisted debiasing step in response to the covariate shift, accompanied by a knowledge-guided sparsifying procedure leveraging the majority data to enhance learning on the minority group. We also develop a model selection method to avoid negative knowledge transfer that can work in the absence of gold standard labels on the target sample. Theoretical analyses show that MAKEUP provides efficient estimation for the target model on the minority group. It maintains robustness to the high complexity and misspecification of the nuisance models used for covariate shift correction, as well as adaptivity to the model heterogeneity and potential negative transfer between the majority and minority groups. Numerical studies demonstrate similar advantages in finite sample settings over existing approaches. We also illustrate our approach through a real-world application about the transfer learning of Type II diabetes genetic risk models on some underrepresented ancestry group.

翻译：协变量偏移与结局模型异质性是利用外部数据源提升代表性不足群体风险建模准确性的两大挑战，这些群体通常缺乏精确标签。本文研究针对未标记少数群体样本的迁移学习问题，该样本面临：(i) 与来自不同群体的已标记源样本存在协变量偏移；(ii) 与对目标少数群体模型具有信息价值的部分多数群体样本存在结局模型异质性。针对这一场景，我们提出了一种面向代表性不足人群的新型模型辅助与知识引导迁移学习（MAKEUP）方法，适用于高维回归模型。MAKEUP方法包含一个应对协变量偏移的模型辅助去偏步骤，以及一个利用多数群体数据进行知识引导的稀疏化过程，以增强对少数群体的学习。我们还开发了一种模型选择方法，可在目标样本缺乏金标准标签的情况下避免负向知识迁移。理论分析表明，MAKEUP能为少数群体的目标模型提供有效估计。该方法对用于协变量校正的干扰模型的高复杂性和误设具有鲁棒性，并能适应模型异质性以及多数群体与少数群体之间潜在的负向迁移。数值研究在有限样本条件下验证了该方法相较于现有方法的类似优势。我们通过一个真实世界应用案例进一步阐释了本方法：将II型糖尿病遗传风险模型迁移至某个代表性不足的祖先群体。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日