The Hausdorff distance is a fundamental measure for comparing sets of vectors, widely used in database theory and geometric algorithms. However, its exact computation is computationally expensive, often making it impractical for large-scale applications such as multi-vector databases. In this paper, we introduce an approximation framework that efficiently estimates the Hausdorff distance while maintaining rigorous error bounds. Our approach leverages approximate nearest-neighbor (ANN) search to construct a surrogate function that preserves essential geometric properties while significantly reducing computational complexity. We provide a formal analysis of approximation accuracy, deriving both worst-case and expected error bounds. Additionally, we establish theoretical guarantees on the stability of our method under transformations, including translation, rotation, and scaling, and quantify the impact of non-uniform scaling on approximation quality. This work provides a principled foundation for integrating Hausdorff distance approximations into large-scale data retrieval and similarity search applications, ensuring both computational efficiency and theoretical correctness.
翻译:Hausdorff距离是用于比较向量集合的基本度量,在数据库理论和几何算法中广泛应用。然而,其精确计算的计算成本高昂,通常使其不适用于多向量数据库等大规模应用场景。本文提出一种近似计算框架,能够在保持严格误差界的同时高效估计Hausdorff距离。该方法利用近似最近邻搜索构建代理函数,在显著降低计算复杂度的同时保留关键几何特性。我们对近似精度进行了形式化分析,推导出最坏情况和期望误差界。此外,我们建立了该方法在平移、旋转和缩放变换下的稳定性理论保证,并量化了非均匀缩放对近似质量的影响。这项工作为将Hausdorff距离近似计算集成到大规模数据检索和相似性搜索应用中提供了理论基础,同时确保了计算效率和理论正确性。