Intensive algorithmic efforts have been made to enable the rapid improvements of certificated robustness for complex ML models recently. However, current robustness certification methods are only able to certify under a limited perturbation radius. Given that existing pure data-driven statistical approaches have reached a bottleneck, in this paper, we propose to integrate statistical ML models with knowledge (expressed as logical rules) as a reasoning component using Markov logic networks (MLN, so as to further improve the overall certified robustness. This opens new research questions about certifying the robustness of such a paradigm, especially the reasoning component (e.g., MLN). As the first step towards understanding these questions, we first prove that the computational complexity of certifying the robustness of MLN is #P-hard. Guided by this hardness result, we then derive the first certified robustness bound for MLN by carefully analyzing different model regimes. Finally, we conduct extensive experiments on five datasets including both high-dimensional images and natural language texts, and we show that the certified robustness with knowledge-based logical reasoning indeed significantly outperforms that of the state-of-the-arts.
翻译:近年来,大量算法研究推动了复杂机器学习模型认证鲁棒性的快速提升。然而,当前鲁棒性认证方法仅能在有限扰动半径范围内进行认证。鉴于现有纯数据驱动统计方法已触及瓶颈,本文提出将统计机器学习模型与知识(以逻辑规则形式表示)相结合,利用马尔可夫逻辑网络(MLN)作为推理组件,以进一步改善整体认证鲁棒性。这一范式(尤其是推理组件如MLN)的鲁棒性认证问题由此催生了新的研究课题。作为理解这些问题的初步探索,我们首先证明了MLN鲁棒性认证的计算复杂度为#P-hard。基于这一难度结论,我们通过细致分析不同模型机制,推导出MLN首个认证鲁棒性边界。最后,我们在涵盖高维图像与自然语言文本的五组数据集上开展全面实验,结果表明引入基于知识的逻辑推理后,认证鲁棒性显著优于现有最优方法。