Integrating domain knowledge into deep learning has emerged as a promising direction for improving model interpretability, generalization, and data efficiency. In this work, we present a novel knowledge-guided ViT-based Masked Autoencoder that embeds scientific domain knowledge within the self-supervised reconstruction process. Instead of relying solely on data-driven optimization, our proposed approach incorporates the Linear Spectral Mixing Model (LSMM) as a physical constraint and physically-based Spectral Angle Mapper (SAM), ensuring that learned representations adhere to known structural relationships between observed signals and their latent components. The framework jointly optimizes LSMM and SAM loss with a conventional Huber loss objective, promoting both numerical accuracy and geometric consistency in the feature space. This knowledge-guided design enhances reconstruction fidelity, stabilizes training under limited supervision, and yields interpretable latent representations grounded in physical principles. The experimental findings indicate that the proposed model substantially enhances reconstruction quality and improves downstream task performance, highlighting the promise of embedding physics-informed inductive biases within transformer-based self-supervised learning.
翻译:将领域知识融入深度学习已成为提升模型可解释性、泛化能力和数据效率的一个有前景的方向。在本工作中,我们提出了一种新颖的知识引导的基于ViT的掩码自编码器,它将科学领域知识嵌入到自监督重建过程中。与仅依赖数据驱动优化不同,我们提出的方法将线性光谱混合模型作为物理约束,并结合基于物理的光谱角制图器,确保学习到的表征遵循观测信号与其潜在成分之间已知的结构关系。该框架将LSMM和SAM损失与传统的Huber损失目标联合优化,从而在特征空间中同时促进数值精度和几何一致性。这种知识引导的设计增强了重建保真度,在有限监督下稳定了训练,并产生了基于物理原理的可解释潜在表征。实验结果表明,所提出的模型显著提升了重建质量并改善了下游任务性能,凸显了在基于Transformer的自监督学习中嵌入物理信息归纳偏置的潜力。