Learned index structures achieve high performance by modeling the cumulative distribution function (CDF) of keys, but this reliance on data distributions introduces potential vulnerability to adversarial manipulation. Prior work has explored both static data poisoning and dynamic algorithmic complexity attacks (ACA), though evaluations are typically limited in scale or consider only one threat model. We present a systematic study of both attack paradigms on ALEX, a state-of-the-art dynamic learned index, under a unified and reproducible framework. Our evaluation scales to realistic workloads with up to 200K adversarial inserts and includes multiple SOSD datasets with diverse key distributions, as well as a real-key baseline to isolate adversarial effects. Our results show a clear separation between threat models. Static poisoning has minimal impact on lookup performance in ALEX under bulk-loaded settings, while dynamic ACA induces substantial degradation, with up to 2--2.8x slowdown in lookup throughput. However, attack effectiveness is highly dataset-dependent: dense key distributions limit adversarial leverage due to duplicate-heavy insertions and ALEX's localized structure. We highlight key evaluation considerations, including the need for control workloads and the mismatch between localized structural damage and global query metrics. These results show that robustness in learned indexes depends critically on the interaction between threat model, data distribution, and evaluation methodology.
翻译:学习索引结构通过建模键的累积分布函数(CDF)实现高性能,但这种对数据分布的依赖引入了对对抗性操纵的潜在脆弱性。先前的研究已探索了静态数据投毒和动态算法复杂度攻击(ACA),但评估通常规模有限或仅考虑单一威胁模型。本文在统一且可复现的框架下,对当前最先进的动态学习索引ALEX开展了两种攻击范式的系统性研究。我们的评估规模可扩展至包含多达20万次对抗性插入的真实工作负载,涵盖了具有不同键分布的多个SOSD数据集,并采用真实键基线以隔离对抗效应。研究结果清晰区分了两种威胁模型:在批量加载场景下,静态投毒对ALEX的查找性能影响极小,而动态ACA则引发显著退化,查找吞吐量下降幅度高达2–2.8倍。然而,攻击效果高度依赖数据集特性:密集的键分布因重复键插入过多以及ALEX的局部化结构限制了对抗性攻击的作用。我们强调了关键评估考量因素,包括控制工作负载的必要性,以及局部结构损伤与全局查询指标之间的不匹配。这些结果表明,学习索引的鲁棒性关键取决于威胁模型、数据分布与评估方法论三者间的交互作用。