Transferring the recent advancements in deep learning into scientific disciplines is hindered by the lack of the required large-scale datasets for training. We argue that in these knowledge-rich domains, the established body of scientific theory provides reliable inductive biases in the form of governing physical laws. We address the ill-posed inverse problem of recovering Raman spectra from noisy Coherent Anti-Stokes Raman Scattering (CARS) measurements, as the true Raman signal here is suppressed by a dominating non-resonant background. We propose RamPINN, a model that learns to recover Raman spectra from given CARS spectra. Our core methodological contribution is a physics-informed neural network that utilizes a dual-decoder architecture to disentangle resonant and non-resonant signals. This is done by enforcing the Kramers-Kronig causality relations via a differentiable Hilbert transform loss on the resonant and a smoothness prior on the non-resonant part of the signal. Trained entirely on synthetic data, RamPINN demonstrates strong zero-shot generalization to real-world experimental data, explicitly closing this gap and significantly outperforming existing baselines. Furthermore, we show that training with these physics-based losses alone, without access to any ground-truth Raman spectra, still yields competitive results. This work highlights a broader concept: formal scientific rules can act as a potent inductive bias, enabling robust, self-supervised learning in data-limited scientific domains.
翻译:将深度学习的最新进展引入科学领域受到训练所需大规模数据集缺乏的阻碍。我们认为,在这些知识丰富的领域中,已确立的科学理论体系以支配性物理定律的形式提供了可靠的归纳偏置。我们处理从含噪声的相干反斯托克斯拉曼散射(CARS)测量中恢复拉曼光谱这一不适定逆问题,因为真实的拉曼信号在此被占主导地位的非共振背景所抑制。我们提出RamPINN模型,该模型学习从给定的CARS光谱中恢复拉曼光谱。我们的核心方法贡献是一个物理信息神经网络,它采用双解码器架构来解耦共振与非共振信号。这是通过在信号的共振部分施加基于可微希尔伯特变换损失的Kramers-Kronig因果关系约束,并在非共振部分施加平滑性先验来实现的。RamPINN完全在合成数据上训练,在真实世界实验数据上展现出强大的零样本泛化能力,显式地弥合了这一差距,并显著优于现有基线方法。此外,我们证明仅使用这些基于物理的损失进行训练,无需任何真实拉曼光谱作为监督,仍能获得具有竞争力的结果。这项工作揭示了一个更广泛的概念:形式化的科学规则可以作为一种有效的归纳偏置,在数据有限的科学领域中实现鲁棒的自监督学习。