Learning from weak, proxy, or relative supervision is common when ground-truth labels are unavailable, but robustness under distribution shift remains poorly understood because the supervision mechanism itself may change across environments. We formalize this phenomenon as supervision drift, defined as changes in $P(y \mid x, c)$ across contexts, and study it in CRISPR-Cas13d transcriptomic perturbation experiments where guide efficacy is inferred indirectly from RNA-seq responses. Using publicly available data spanning two human cell lines and multiple post-induction timepoints, we construct a controlled non-IID benchmark with explicit domain (cell line) and temporal shifts, while reusing a fixed weak-label construction across all contexts to avoid changing targets. Across linear and tree-based models, weak supervision supports meaningful learning in-domain (ridge $R^2 = 0.356$, Spearman $ρ= 0.442$) and partial cross-cell-line transfer ($ρ\approx 0.40$). In contrast, temporal transfer collapses across all model classes considered, yielding negative $R^2$ and weak or near-zero $ρ$ (ridge $R^2 = -0.145$, $ρ= 0.008$; XGBoost $R^2 = -0.155$, $ρ= 0.056$; random forest $R^2 = -0.322$, $ρ= 0.139$). Additional robustness analyses using externally recomputed weak labels, shift-score quantification, and simple mitigation baselines preserve the same qualitative pattern. Feature-label association and feature-importance analyses remain relatively stable across cell lines but change sharply over time, indicating that failures arise from supervision drift rather than model capacity or simple covariate shift. These results show that strong in-domain performance under weak supervision can be misleading and motivate feature stability as a lightweight diagnostic for non-transferability before deployment.
翻译:当真实标签不可用时,基于弱监督、代理监督或相对监督的学习方法虽常见,但在分布偏移下的鲁棒性仍缺乏理解,因为监督机制本身可能随环境变化。我们将其形式化为“监督漂移”现象,即条件概率 $P(y \mid x, c)$ 在不同上下文间的变化,并在CRISPR-Cas13d转录组扰动实验中研究该问题——该实验中,向导效率通过RNA-seq响应间接推断。通过使用覆盖两种人类细胞系及多个诱导后时间点的公开数据,我们构建了具有显式领域(细胞系)和时间偏移的可控非独立同分布基准,同时跨所有上下文复用固定弱标签构建方法以避免目标变化。在线性模型和基于树的模型中,弱监督支持领域内有意义的学习(岭回归 $R^2 = 0.356$, Spearman $ρ= 0.442$)及部分跨细胞系迁移($ρ\approx 0.40$)。然而,所有模型类的时间迁移均失效,产生负 $R^2$ 和微弱或近零的 $ρ$(岭回归 $R^2 = -0.145$, $ρ= 0.008$;XGBoost $R^2 = -0.155$, $ρ= 0.056$;随机森林 $R^2 = -0.322$, $ρ= 0.139$)。使用外部重计算弱标签、偏移评分量化及简单缓解基线的额外鲁棒性分析仍保持相同定性模式。特征-标签关联与特征重要性分析在细胞系间相对稳定,但随时间急剧变化,表明失败源于监督漂移而非模型容量或简单协变量偏移。这些结果表明,弱监督下强领域内性能可能具有误导性,并促使特征稳定性成为部署前轻量级不可迁移性诊断指标。