We analyse three KV cache quantization schemes under a fair bit budget: \textbf{KV} (scalar MSE baseline), \textbf{KQV} (WHT + MSE on $K$; WHT + MSE + QJL on $V$), and \textbf{QKQV} (WHT + MSE + QJL on both). Starting from the Beta distribution on the hypersphere, we trace how QJL on $K$ inflates inner product variance by $π/2$, which softmax amplifies nonlinearly via Jensen's inequality, and we present statistical inference and information metrics to highlight practical differences. Three empirical findings emerge. (1)~At $n=4$ (the practically dominant budget), KQV wins on every measure -- KL divergence, geometric $K$ error, and 6D distance -- across all distributions and ranks tested. (2)~The K--V asymmetry is unconditional: QKQV is consistently worse than KQV in KL divergence at every budget and distribution. (3)~A budget-dependent crossover exists: QKQV achieves better geometric $K$ reconstruction at $n \in \{2,3,5\}$, KQV at $n \in \{4,6\}$, invariant to rank and tail weight -- an open rate-distortion problem. $\mathrm{KL}(p_{\mathrm{ref}} \| p_{\mathrm{quant}})$, K-only by construction, bridges K direction error to routing corruption and output collapse. We present a sufficient condition when the Jensen mechanism amplifies superlinearly through the softmax. At $n \in \{2,3,5\}$, QKQV wins geometrically because this assumption does not bind. At $n=4$, elevated K error and KL divergence for QKQV strongly suggest the Jensen mechanism is the operative cause of the crossover, providing a new perspective and explanation.
翻译:我们在公平的比特预算下分析了三种KV缓存量化方案:\textbf{KV}(标量MSE基线)、\textbf{KQV}(对$K$执行WHT + MSE;对$V$执行WHT + MSE + QJL)和\textbf{QKQV}(对两者均执行WHT + MSE + QJL)。从超球面上的Beta分布出发,我们追溯了QJL如何通过$π/2$因子膨胀$K$的内积方差,而softmax通过Jensen不等式非线性地放大该效应;我们提出统计推断与信息度量来凸显实践差异。三项实证发现如下:(1)在$n=4$(实际中占主导的预算)下,KQV在所有分布和秩的测试中均赢得所有度量——KL散度、几何$K$误差和六维距离。(2)K-V不对称性是无条件的:在所有预算和分布下,QKQV的KL散度始终劣于KQV。(3)存在预算依赖的交叉现象:QKQV在$n \in \{2,3,5\}$时实现更优的几何$K$重构,而KQV在$n \in \{4,6\}$时更优——该现象对秩和尾权重不变,是一个开放率失真问题。$\mathrm{KL}(p_{\mathrm{ref}} \| p_{\mathrm{quant}})$(仅依赖于$K$)将K方向误差与路由破坏及输出坍塌联系起来。我们提出了Jensen机制通过softmax实现超线性放大的一个充分条件。在$n \in \{2,3,5\}$时,该条件不成立,因此QKQV在几何度量上胜出;在$n=4$时,QKQV升高的K误差和KL散度强烈表明Jensen机制是交叉现象的运作原因,从而提供了新的视角与解释。