Accurate prediction of molecular properties underpins drug discovery and material design, yet even state-of-the-art models remain vulnerable to localized failure modes that aggregate metrics cannot detect. The places where molecular similarity should be most helpful are also places where standard evaluation can be most misleading. Property cliffs expose this gap: structurally similar molecules can still differ sharply in target property, so models with competitive overall performance may fail in high-risk local neighborhoods. To expose and mitigate this failure mode, CliffSplit, a cliff-aware evaluation protocol that constructs locally supported, cliff-exposed test cases, and CliffLoss, a model-agnostic train-only mitigation mechanism for cliff-sensitive errors, are introduced. Experiments on three QM9 targets and three MoleculeNet tasks across five backbones show that CliffSplit reveals at least 15% higher error in cliff-heavy QM9 regions, while CliffLoss reduces the cliff-to-smooth error gap by up to 30% on Lipophilicity and improves overall MAE by 9.7%. Together, these results turn molecular similarity failure from a descriptive anomaly into a benchmarked evaluation problem for molecular machine learning. The code is available at https://anonymous.4open.science/r/Cliff_Loss.
翻译:分子性质的准确预测是药物发现和材料设计的基础,然而即便最先进的模型仍然容易受到聚合指标无法检测的局部失效模式影响。分子相似性本应最有帮助的地方,恰恰也是标准评估最可能产生误导的区域。物性悬崖暴露了这一差距:结构相似的分子在目标性质上仍可能差异显著,因此具有竞争性整体表现的模型可能在高风险局部邻域中失效。为揭示并缓解这一失效模式,本文引入了CliffSplit——一种构建局部支撑且暴露悬崖的测试用例的悬崖感知评估协议,以及CliffLoss——一种针对悬崖敏感性错误的模型无关纯训练缓解机制。在五个主干模型上针对三个QM9目标和三个MoleculeNet任务的实验表明,CliffSplit在悬崖密集的QM9区域揭示出至少15%更高的误差,而CliffLoss在亲脂性任务上将悬崖-平滑误差差距减少高达30%,并将整体MAE提升9.7%。这些结果共同将分子相似性失效从一种描述性异常转化为分子机器学习中可基准化的评估问题。代码见https://anonymous.4open.science/r/Cliff_Loss。