Small-molecule identification from tandem mass spectrometry (MS/MS) remains a bottleneck in untargeted settings where spectral libraries are incomplete. While deep learning offers a solution, current approaches typically fall into two extremes: explicit generative models that construct molecular graphs atom-by-atom, or joint contrastive models that learn cross-modal subspaces from scratch. We introduce SpecBridge, a novel implicit alignment framework that treats structure identification as a geometric alignment problem. SpecBridge fine-tunes a self-supervised spectral encoder (DreaMS) to project directly into the latent space of a frozen molecular foundation model (ChemBERTa), and then performs retrieval by cosine similarity to a fixed bank of precomputed molecular embeddings. Across MassSpecGym, Spectraverse, and MSnLib benchmarks, SpecBridge improves top-1 retrieval accuracy by roughly 20-25% relative to strong neural baselines, while keeping the number of trainable parameters small. These results suggest that aligning to frozen foundation models is a practical, stable alternative to designing new architectures from scratch. The code for SpecBridge is released at https://github.com/HassounLab/SpecBridge.
翻译:串联质谱(MS/MS)中的小分子鉴定在非靶向场景下仍是一个瓶颈,因为谱库往往不完整。尽管深度学习提供了解决方案,但当前方法通常陷入两个极端:要么是逐原子构建分子图的显式生成模型,要么是从头学习跨模态子空间的联合对比模型。我们提出了SpecBridge,一种新颖的隐式对齐框架,将结构鉴定视为几何对齐问题。SpecBridge通过微调自监督谱编码器(DreaMS),使其直接投影到冻结的分子基础模型(ChemBERTa)的潜在空间中,然后通过余弦相似度在预计算的分子嵌入固定库中进行检索。在MassSpecGym、Spectraverse和MSnLib基准测试中,SpecBridge相较于强大的神经基线方法,将top-1检索准确率提升了约20-25%,同时保持了较少的可训练参数量。这些结果表明,与从头设计新架构相比,对齐至冻结的基础模型是一种实用且稳定的替代方案。SpecBridge的代码发布于https://github.com/HassounLab/SpecBridge。