Graph workloads pose a particularly challenging problem for query optimizers. They typically feature large queries made up of entirely many-to-many joins with complex correlations. This puts significant stress on traditional cardinality estimation methods which generally see catastrophic errors when estimating the size of queries with only a handful of joins. To overcome this, we propose COLOR, a framework for subgraph cardinality estimation which applies insights from graph compression theory to produce a compact summary that captures the global topology of the data graph. Further, we identify several key optimizations that enable tractable estimation over this summary even for large query graphs. We then evaluate several designs within this framework and find that they improve accuracy by up to 10$^3$x over all competing methods while maintaining fast inference, a small memory footprint, efficient construction, and graceful degradation under updates.
翻译:图工作负载给查询优化器带来了一个极具挑战性的问题。这类工作负载通常包含由完全多对多连接构成的复杂查询,且带有复杂的相关性。这对传统基数估计方法造成了巨大压力,这些方法在估计仅含少量连接的查询规模时往往会出现灾难性错误。为克服这一难题,我们提出COLOR——一个子图基数估计框架,它借鉴图压缩理论的思想,生成能够捕获数据图全局拓扑结构的紧凑摘要。此外,我们识别出若干关键优化技术,即使面对大规模查询图,也能基于该摘要实现可处理的估计。随后,我们对该框架内的多种设计方案进行了评估,发现其精度较所有竞争方法提升高达10³倍,同时保持快速推理、低内存占用、高效构建以及在数据更新下的优雅降级特性。