键中心分子指纹衍生方法：一项BBBP数据集研究 (Bond-Centered Molecular Fingerprint Derivatives: A BBBP Dataset Study)

Bond Centered FingerPrint (BCFP) are a complementary, bond-centric alternative to Extended-Connectivity Fingerprints (ECFP). We introduce a static BCFP that mirrors the bond-convolution used by directed message-passing GNNs like ChemProp, and evaluate it with a fast rapid Random Forest model on Brain-Blood Barrier Penetration (BBBP) classification task. Across stratified cross-validation, concatenating ECFP with BCFP consistently improves AUROC and AUPRC over either descriptor alone, as confirmed by Turkey HSD multiple-comparison analysis. Among radii, r = 1 performs best; r = 2 does not yield statistically separable gains under the same test. We further propose BCFP-Sort&Slice, a simple feature-combination scheme that preserves the out-of-vocabulary (OOV) count information native to ECFP count vectors while enabling compact unhashed concatenation of BCFP variants. We also outperform the MGTP prediction on our BBBP evaluation, using such composite new features bond and atom features. These results show that lightweight, bond-centered descriptors can complement atom-centered circular fingerprints and provide strong, fast baselines for BBBP prediction.

翻译：键中心指纹（BCFP）是扩展连接指纹（ECFP）的一种互补性、以化学键为中心的替代方案。我们提出了一种静态BCFP，其模拟了如ChemProp等定向消息传递图神经网络所使用的键卷积操作，并通过快速随机森林模型在血脑屏障穿透（BBBP）分类任务上进行了评估。在分层交叉验证中，将ECFP与BCFP拼接使用，相较于单独使用任一描述符，持续提升了AUROC和AUPRC指标，这一结论经Turkey HSD多重比较分析证实。在不同半径参数中，r = 1表现最佳；在相同测试条件下，r = 2未能产生统计上可区分的增益。我们进一步提出了BCFP-Sort&Slice，这是一种简单的特征组合方案，既能保留ECFP计数向量固有的未登录词（OOV）计数信息，又能实现BCFP变体的紧凑、非哈希化拼接。通过结合此类新型键与原子特征，我们在BBBP评估中也超越了MGTP的预测性能。这些结果表明，轻量级的键中心描述符可以补充以原子为中心的圆形指纹，并为BBBP预测提供强大且快速的基线。