Optimization tasks over relational data, such as clustering, often suffer from the prohibitive cost of join operations, which are necessary to access the full dataset. While geometric data structures like BBD trees yield fast approximation algorithms in the standard computational setting, their application to relational data remains unclear due to the size of the join output. In this paper, we introduce a framework that leverages geometric insights to design faster algorithms when the data is stored as the results of a join query in a relational database. Our core contribution is the development of the RBBD tree, a randomized variant of the BBD tree tailored for relational settings. Instead of completely constructing the RBBD tree, by leveraging efficient sampling and counting techniques over relational joins, we enable on-the-fly efficient expansion of the RBBD tree, maintaining only the necessary parts. This allows us to simulate geometric query procedures without materializing the join result. As an application, we present algorithms that improve the state-of-the-art for relational $k$-center/means/median clustering by a factor of $k$ in running time while maintaining the same approximation guarantees. Our method is general and can be applied to various optimization problems in the relational setting.
翻译:关系数据上的优化任务(如聚类)常因连接操作的高昂代价而受限,而连接操作又是访问完整数据集所必需的。尽管在标准计算环境下,BBD树等几何数据结构能够实现快速近似算法,但由于连接输出规模庞大,其在关系数据上的应用仍不明确。本文提出一个框架,利用几何洞察力为存储在关系数据库连接查询结果中的数据设计更快速的算法。我们的核心贡献是开发了RBBD树,这是一种专为关系型场景定制的BBD树随机化变体。通过利用关系连接上的高效采样与计数技术,我们无需完全构建RBBD树,而是支持其按需动态高效扩展,仅维护必要部分。这使得我们能够在无需物化连接结果的情况下模拟几何查询过程。作为应用案例,我们提出的算法将关系型$k$-中心/均值/中位数聚类任务的时间复杂度提升了$k$倍,同时保持相同的近似保证。该方法具有通用性,可应用于关系型场景中的各类优化问题。