Algorithmic Complexity Attacks on Dynamic Learned Indexes

Learned Index Structures (LIS) view a sorted index as a model that learns the data distribution, takes a data element key as input, and outputs the predicted position of the key. The original LIS can only handle lookup operations with no support for updates, rendering it impractical to use for typical workloads. To address this limitation, recent studies have focused on designing efficient dynamic learned indexes. ALEX, as the pioneering dynamic learned index structures, enables dynamism by incorporating a series of design choices, including adaptive key space partitioning, dynamic model retraining, and sophisticated engineering and policies that prioritize read/write performance. While these design choices offer improved average-case performance, the emphasis on flexibility and performance increases the attack surface by allowing adversarial behaviors that maximize ALEX's memory space and time complexity in worst-case scenarios. In this work, we present the first systematic investigation of algorithmic complexity attacks (ACAs) targeting the worst-case scenarios of ALEX. We introduce new ACAs that fall into two categories, space ACAs and time ACAs, which target the memory space and time complexity, respectively. First, our space ACA on data nodes exploits ALEX's gapped array layout and uses Multiple-Choice Knapsack (MCK) to generate an optimal adversarial insertion plan for maximizing the memory consumption at the data node level. Second, our space ACA on internal nodes exploits ALEX's catastrophic cost mitigation mechanism, causing an out-of-memory error with only a few hundred adversarial insertions. Third, our time ACA generates pathological insertions to increase the disparity between the actual key distribution and the linear models of data nodes, deteriorating the runtime performance by up to 1,641X compared to ALEX operating under legitimate workloads.

翻译：学习索引结构（LIS）将排序索引视为一个模型，该模型学习数据分布，以数据元素键作为输入，并输出该键的预测位置。最初的LIS仅能处理查找操作，不支持更新，这使其无法用于典型工作负载。为解决这一局限，近期研究聚焦于设计高效的动态学习索引。ALEX作为开创性的动态学习索引结构，通过引入一系列设计选择实现动态性，包括自适应键空间划分、动态模型重训练，以及优先考虑读写性能的精细工程策略。尽管这些设计选择提升了平均性能，但对灵活性和性能的侧重增加了攻击面，使得敌手行为能够在最坏情况下最大化ALEX的内存空间和时间复杂度。本研究首次系统性地针对ALEX最坏情况下的算法复杂度攻击（ACA）展开探索。我们提出两类新型ACA：空间ACA和时间ACA，分别针对内存空间和时间复杂度。首先，针对数据节点的空间ACA利用ALEX的间隙数组布局，采用多选择背包（MCK）生成最优敌手插入方案，以最大化数据节点级别的内存消耗。其次，针对内部节点的空间ACA利用ALEX的灾难性成本缓解机制，仅需数百次敌手插入即可引发内存溢出错误。最后，我们所提出的时间ACA通过生成病态插入操作，加剧实际键分布与数据节点线性模型之间的偏差，使运行时性能相较于正常负载下的ALEX最多退化1641倍。