We present T3C, a train-once, test-time budget-conditioned compression framework that exposes rank and precision as a controllable deployment knob. T3C combines elastic tensor factorization (maintained up to a maximal rank) with rank-tied mixed-precision quantization and a lightweight controller that maps a latency/energy/size budget token to per-layer rank/bit assignments; the policy snaps to hardware-aligned profiles and is monotone in the budget. A fast, layerwise consistency certificate, computed from spectral proxies and activation statistics, upper-bounds logit drift and regularizes training, yielding a practical reliability signal with negligible overhead. On ImageNet-1k, T3C shifts the vision Pareto frontier: for ResNet-50 at matched accuracy (\leq 0.5% drop), p50 latency is 1.18ms with a 38MB model, outperforming PTQ-8b (1.44ms, 88MB); for ViT-B/16, T3C reaches 2.30ms p50 with 59MB, improving over strong PTQ/QAT baselines. A single T3C checkpoint therefore provides predictable, certificate-backed accuracy-latency-size trade-offs on demand across devices.
翻译:本文提出T3C,一种一次性训练、测试时预算条件化的压缩框架,将秩和精度作为可调控的部署参数。T3C结合了弹性张量分解(维持至最大秩)、秩绑定的混合精度量化,以及一个轻量级控制器——该控制器将延迟/能耗/大小预算标记映射至每层的秩/比特分配;该策略会贴合硬件对齐的配置剖面,且随预算单调变化。通过从谱代理和激活统计量计算得到的快速分层一致性证书,可对逻辑偏移进行上界约束并正则化训练,从而以可忽略的开销提供实用的可靠性信号。在ImageNet-1k上,T3C推动了视觉帕累托前沿:对于ResNet-50在保持精度(下降≤0.5%)的情况下,p50延迟为1.18ms,模型大小为38MB,优于PTQ-8b(1.44ms,88MB);对于ViT-B/16,T3C以59MB模型达到2.30ms的p50延迟,较PTQ/QAT基线有显著提升。因此,单个T3C检查点可在不同设备上按需提供可预测的、具备证书保障的精度-延迟-大小权衡。