Post-translational modifications (PTMs) form a combinatorial "code" that regulates protein function, yet deciphering this code - linking modified sites to their catalytic enzymes - remains a central unsolved problem in understanding cellular signaling and disease. We introduce COMPASS-PTM, a mechanism-aware, coarse-to-fine learning framework that unifies residue-level PTM profiling with enzyme-substrate assignment. COMPASS-PTM integrates evolutionary representations from protein language models with physicochemical priors and a crosstalk-aware prompting mechanism that explicitly models inter-PTM dependencies. This design allows the model to learn biologically coherent patterns of cooperative and antagonistic modifications while addressing the dual long-tail distribution of PTM data. Across multiple proteome-scale benchmarks, COMPASS-PTM establishes new state-of-the-art performance, including a 122% relative F1 improvement in multi-label site prediction and a 54% gain in zero-shot enzyme assignment. Beyond accuracy, the model demonstrates interpretable generalization, recovering canonical kinase motifs and predicting disease-associated PTM rewiring caused by missense variants. By bridging statistical learning with biochemical mechanism, COMPASS-PTM unifies site-level and enzyme-level prediction into a single framework that learns the grammar underlying protein regulation and signaling.
翻译:翻译后修饰(PTMs)构成了一种调控蛋白质功能的组合型“密码”,然而破译这一密码——即将修饰位点与其催化酶关联起来——仍然是理解细胞信号传导与疾病机制中一个尚未解决的核心问题。我们提出了COMPASS-PTM,一种机制感知、从粗到细的学习框架,将残基水平的PTM分析与酶-底物分配任务相统一。该框架整合了蛋白质语言模型的进化表征、物理化学先验知识,以及一个能够显式建模PTM间相互依赖关系的串扰感知提示机制。这一设计使模型能够学习具有生物学一致性的协同与拮抗修饰模式,同时应对PTM数据固有的双长尾分布问题。在多个蛋白质组规模的基准测试中,COMPASS-PTM均取得了最先进的性能表现,包括多标签位点预测任务中相对F1分数提升122%,以及零样本酶分配任务中性能增益达54%。除准确性外,该模型展现出可解释的泛化能力,能够识别经典激酶基序,并预测由错义变异引起的疾病相关PTM网络重构。通过将统计学习与生化机制相结合,COMPASS-PTM将位点水平与酶水平的预测统一于单一框架中,从而学习蛋白质调控与信号传导的内在规律。