When Does q-error Predict Plan Regret? Three Regimes of Cardinality-Estimation Error

Cardinality-estimation (CE) research ranks estimators by q-error, yet it is well known that q-error is an imperfect proxy for query-plan quality. We give a measurement-driven account of when it is a good proxy and when it is not, and why. Modeling plan selection as an argmin over a piecewise-linear cost landscape, we find that plan regret (the cost of the chosen plan relative to the optimal, under true cardinalities) is governed by plan-cost geometry in a regime-dependent way. (i) For small errors, a true-point condition number kappa predicts regret and out-predicts q-error; its predictive power decays to zero as error grows, as a local linearization must. (ii) For large errors -- where deployed learned estimators operate -- an estimator-independent average-case sub-optimality measure ACS-infinity predicts which queries are regret-prone (Spearman rho ~ 0.54 on STATS-CEB), while q-error is nearly uninformative at the query level (rho ~ 0.05). (iii) The worst case is Haritsa's maximum sub-optimality (MSO). The three are one cost-ratio spectrum under three weightings. We prove a limit law ACS-infinity = sum_k r_k pi_k with cardinality-independent combinatorial weights, and validate every claim on STATS-CEB and JOB-light with four released estimators under pre-registered decision rules, and confirm on real PostgreSQL runtime that ACS-infinity predicts regret where q-error does not. The contribution is conceptual and empirical -- an average-case companion to worst-case robust query optimization, and a characterization of when an accuracy metric tracks plan quality -- rather than a new estimator. Code and the full pre-registration are public.

翻译：基数估计（CE）研究通过q-error对估计器进行排序，但众所周知，q-error是查询计划质量的不完美代理指标。我们基于测量结果，系统阐述了q-error在何种情况下是良好的代理指标、何时不是，并解释了原因。通过将计划选择建模为分段线性代价景观上的argmin问题，我们发现计划遗憾（在真实基数下所选计划相对于最优计划的代价）受制于以区间依赖方式呈现的计划代价几何结构：（i）对于小误差，真点条件数kappa可预测遗憾且效果优于q-error；其预测能力随误差增大而衰减至零——局部线性化必然如此。（ii）对于大误差（即已部署的学习型估计器的工作区间），与估计器无关的平均情况次优性度量ACS-infinity可预测哪些查询容易产生遗憾（在STATS-CEB上Spearman rho ~ 0.54），而q-error在查询层面几乎无信息量（rho ~ 0.05）。（iii）最坏情况是Haritsa的最大次优性（MSO）。这三者构成三种不同加权下的代价比率谱。我们证明了极限定律ACS-infinity = sum_k r_k pi_k，其中包含与基数无关的组合权重，并通过预注册决策规则在STATS-CEB和JOB-light上使用四个已发布估计器验证了所有结论，同时在实际PostgreSQL运行时中确认ACS-infinity能预测q-error无法预测的遗憾。本贡献是概念性与实证性的——作为最坏情况鲁棒查询优化的平均情况补充，并刻画了准确度指标何时能追踪计划质量——而非提出新估计器。代码及完整预注册文档均已公开。