Data poisoning attacks on clustering algorithms have received limited attention, with existing methods struggling to scale efficiently as dataset sizes and feature counts increase. These attacks typically require re-clustering the entire dataset multiple times to generate predictions and assess the attacker's objectives, significantly hindering their scalability. This paper addresses these limitations by proposing Sonic, a novel genetic data poisoning attack that leverages incremental and scalable clustering algorithms, e.g., FISHDBC, as surrogates to accelerate poisoning attacks against graph-based and density-based clustering methods, such as HDBSCAN. We empirically demonstrate the effectiveness and efficiency of Sonic in poisoning the target clustering algorithms. We then conduct a comprehensive analysis of the factors affecting the scalability and transferability of poisoning attacks against clustering algorithms, and we conclude by examining the robustness of hyperparameters in our attack strategy Sonic.
翻译:针对聚类算法的数据投毒攻击研究目前较为有限,现有方法在数据集规模和特征维度增加时难以高效扩展。此类攻击通常需要对整个数据集进行多次重新聚类以生成预测并评估攻击者目标,严重制约了其可扩展性。本文提出Sonic,一种新颖的遗传数据投毒攻击方法,通过利用增量式可扩展聚类算法(如FISHDBC)作为替代模型,加速针对基于图和密度的聚类方法(如HDBSCAN)的投毒攻击,从而解决上述局限性。我们通过实验验证了Sonic在毒化目标聚类算法方面的有效性和高效性。随后,我们系统分析了影响聚类算法投毒攻击可扩展性与可迁移性的关键因素,并通过检验攻击策略Sonic中超参数的鲁棒性完成总结。