Deep learning models for medical data are typically trained using task specific objectives that encourage representations to collapse onto a small number of discriminative directions. While effective for individual prediction problems, this paradigm underutilizes the rich structure of clinical data and limits the transferability, stability, and interpretability of learned features. In this work, we propose dense feature learning, a representation centric framework that explicitly shapes the linear structure of medical embeddings. Our approach operates directly on embedding matrices, encouraging spectral balance, subspace consistency, and feature orthogonality through objectives defined entirely in terms of linear algebraic properties. Without relying on labels or generative reconstruction, dense feature learning produces representations with higher effective rank, improved conditioning, and greater stability across time. Empirical evaluations across longitudinal EHR data, clinical text, and multimodal patient representations demonstrate consistent improvements in downstream linear performance, robustness, and subspace alignment compared to supervised and self supervised baselines. These results suggest that learning to span clinical variation may be as important as learning to predict clinical outcomes, and position representation geometry as a first class objective in medical AI.
翻译:针对医学数据的深度学习模型通常使用特定任务目标进行训练,这些目标促使表征坍缩到少量判别性方向上。尽管对个体预测问题有效,这种范式未能充分利用临床数据的丰富结构,并限制了所学特征的可迁移性、稳定性和可解释性。本文提出密集特征学习——一种以表征为中心的框架,旨在显式塑造医学嵌入的线性结构。我们的方法直接作用于嵌入矩阵,通过完全基于线性代数性质定义的目标函数,促进谱平衡、子空间一致性和特征正交性。在不依赖标签或生成式重建的情况下,密集特征学习能产生具有更高有效秩、更优条件数和跨时间稳定性的表征。通过对纵向电子健康记录数据、临床文本和多模态患者表征的系统评估,实验结果表明:相较于监督式与自监督基线方法,本方法在下游线性性能、鲁棒性和子空间对齐方面均取得持续改进。这些发现表明,学习覆盖临床变异可能与学习预测临床结局同等重要,并将表征几何结构确立为医学人工智能的一类首要优化目标。