Deep learning-based recommender systems have become an integral part of several online platforms. However, their black-box nature emphasizes the need for explainable artificial intelligence (XAI) approaches to provide human-understandable reasons why a specific item gets recommended to a given user. One such method is counterfactual explanation (CF). While CFs can be highly beneficial for users and system designers, malicious actors may also exploit these explanations to undermine the system's security. In this work, we propose H-CARS, a novel strategy to poison recommender systems via CFs. Specifically, we first train a logical-reasoning-based surrogate model on training data derived from counterfactual explanations. By reversing the learning process of the recommendation model, we thus develop a proficient greedy algorithm to generate fabricated user profiles and their associated interaction records for the aforementioned surrogate model. Our experiments, which employ a well-known CF generation method and are conducted on two distinct datasets, show that H-CARS yields significant and successful attack performance.
翻译:基于深度学习的推荐系统已成为多个在线平台不可或缺的组成部分。然而,其黑箱特性凸显了对可解释人工智能(XAI)方法的需求,以提供人类可理解的理由,说明为何特定项目会被推荐给特定用户。其中一种方法是反事实解释(CF)。虽然CF对用户和系统设计者极为有益,但恶意行为者也可能利用这些解释来破坏系统的安全性。在本研究中,我们提出H-CARS,一种通过反事实解释污染推荐系统的新策略。具体而言,我们首先在从反事实解释导出的训练数据上训练一个基于逻辑推理的替代模型。通过反转推荐模型的学习过程,我们据此开发出一种高效的贪心算法,为前述替代模型生成伪造的用户档案及其关联交互记录。我们的实验采用一种知名的反事实解释生成方法,并在两个不同数据集上进行,结果表明H-CARS取得了显著且成功的攻击性能。