Cardinality-estimation (CE) research ranks estimators by q-error, yet it is well known that q-error is an imperfect proxy for query-plan quality. We give a measurement-driven account of when it is a good proxy and when it is not, and why. Modeling plan selection as an argmin over a piecewise-linear cost landscape, we find that plan regret (the cost of the chosen plan relative to the optimal, under true cardinalities) is governed by plan-cost geometry in a regime-dependent way. (i) For small errors, a true-point condition number kappa predicts regret and out-predicts q-error; its predictive power decays to zero as error grows, as a local linearization must. (ii) For large errors -- where deployed learned estimators operate -- an estimator-independent average-case sub-optimality measure ACS-infinity predicts which queries are regret-prone (Spearman rho ~ 0.54 on STATS-CEB), while q-error is nearly uninformative at the query level (rho ~ 0.05). (iii) The worst case is Haritsa's maximum sub-optimality (MSO). The three are one cost-ratio spectrum under three weightings. We prove a limit law ACS-infinity = sum_k r_k pi_k with cardinality-independent combinatorial weights, and validate every claim on STATS-CEB and JOB-light with four released estimators under pre-registered decision rules, and confirm on real PostgreSQL runtime that ACS-infinity predicts regret where q-error does not. The contribution is conceptual and empirical -- an average-case companion to worst-case robust query optimization, and a characterization of when an accuracy metric tracks plan quality -- rather than a new estimator. Code and the full pre-registration are public.
翻译:基数估计(CE)研究通过q-error对估计器进行排序,但众所周知,q-error是查询计划质量的不完美代理指标。我们基于测量结果,系统阐述了q-error在何种情况下是良好的代理指标、何时不是,并解释了原因。通过将计划选择建模为分段线性代价景观上的argmin问题,我们发现计划遗憾(在真实基数下所选计划相对于最优计划的代价)受制于以区间依赖方式呈现的计划代价几何结构:(i)对于小误差,真点条件数kappa可预测遗憾且效果优于q-error;其预测能力随误差增大而衰减至零——局部线性化必然如此。(ii)对于大误差(即已部署的学习型估计器的工作区间),与估计器无关的平均情况次优性度量ACS-infinity可预测哪些查询容易产生遗憾(在STATS-CEB上Spearman rho ~ 0.54),而q-error在查询层面几乎无信息量(rho ~ 0.05)。(iii)最坏情况是Haritsa的最大次优性(MSO)。这三者构成三种不同加权下的代价比率谱。我们证明了极限定律ACS-infinity = sum_k r_k pi_k,其中包含与基数无关的组合权重,并通过预注册决策规则在STATS-CEB和JOB-light上使用四个已发布估计器验证了所有结论,同时在实际PostgreSQL运行时中确认ACS-infinity能预测q-error无法预测的遗憾。本贡献是概念性与实证性的——作为最坏情况鲁棒查询优化的平均情况补充,并刻画了准确度指标何时能追踪计划质量——而非提出新估计器。代码及完整预注册文档均已公开。