Approximate nearest neighbor (ANN) search in AI systems increasingly handles sensitive data on third-party infrastructure. Trusted execution environments (TEEs) offer protection, but cost-efficient deployments must rely on external SSDs, which leaks user queries through disk access patterns to the host. Oblivious RAM (ORAM) can hide these access patterns but at a high cost; when paired with existing disk-based ANN search techniques, it makes poor use of SSD resources, yielding high latency and poor cost-efficiency. The core challenge for efficient oblivious ANN search over SSDs is balancing both bandwidth and access count. The state-of-the-art ORAM-ANN design minimizes access count at the ANN level and bandwidth at the ORAM level, each trading-off the other, leaving the combined system with both resources overutilized. We propose inverting this design, minimizing bandwidth consumption in the ANN layer and access count in the ORAM layer, since each component is better suited for its new role: ANN's inherent approximation allows for more bandwidth efficiency, while ORAM has no fundamental lower bounds on access count (as opposed to bandwidth). To this end, we propose a cost-efficient approach, Onyx, with two new co-designed components: Onyx-ANNS introduces a compact intermediate representation that proactively prunes the majority of bandwidth-intensive accesses without hurting recall, and Onyx-ORAM proposes a locality-aware shallow tree design that reduces access count while remaining compatible with bandwidth-efficient ORAM techniques. Compared to the state-of-the-art oblivious ANN search system, Onyx achieves $1.7-9.9\times$ lower cost and $2.3-12.3\times$ lower latency.
翻译:在AI系统中,近似最近邻(ANN)搜索日益需要在第三方基础设施上处理敏感数据。可信执行环境(TEE)可提供保护,但成本高效的部署必须依赖外部SSD,这会导致磁盘访问模式向主机泄露用户查询。遗忘RAM(ORAM)可以隐藏这些访问模式,但成本较高;当与现有基于磁盘的ANN搜索技术结合时,它无法有效利用SSD资源,导致高延迟和低成本效率。在SSD上实现高效遗忘ANN搜索的核心挑战在于平衡带宽和访问次数。现有最先进的ORAM-ANN设计分别优化ANN层面的访问次数和ORAM层面的带宽,两者相互取舍,导致组合系统在两种资源上均过度利用。我们提出反转这一设计:在ANN层最小化带宽消耗,在ORAM层最小化访问次数,因为每个组件更适合其新角色——ANN的固有近似特性可实现更高的带宽效率,而ORAM在访问次数上(与带宽不同)没有根本性下界。为此,我们提出成本高效方法Onyx,包含两个协同设计的组件:Onyx-ANNS引入紧凑中间表示,主动剪枝大部分带宽密集型访问而不损失召回率;Onyx-ORAM提出感知局部性的浅树设计,在减少访问次数的同时保持与带宽高效ORAM技术的兼容性。与现有最先进的遗忘ANN搜索系统相比,Onyx实现成本降低1.7-9.9倍,延迟降低2.3-12.3倍。