Low-bit quantization enables deployment of image restoration (IR) networks on resource-constrained devices, but introduces rounding noise that disproportionately degrades high-frequency regions such as edges and fine textures. Existing knowledge distillation (KD) methods apply distillation signals uniformly across all spatial locations, overlooking the varying reconstruction difficulty across image regions. To address this, we propose SPARK (Spatial Policy-driven Adaptive Reinforcement Learning for Knowledge Distillation), a framework that adaptively allocates distillation effort using a lightweight reinforcement learning (RL) policy network. At each training step, a difficulty feature extractor computes four signals, namely Laplacian variance, pixel variance, student reconstruction error, and teacher-student knowledge gap, which are fed into a compact policy CNN that produces a stochastic spatial weight map to modulate the KD loss during quantization-aware training (QAT). SPARK is IR task-agnostic, adds no inference cost, and integrates into any existing QAT pipeline without architectural changes. Experiments on benchmark datasets demonstrate that SPARK consistently outperforms PTQ, QAT, and state-of-the-art (SOTA) KD approaches across multiple student architectures, achieving reconstruction quality closest to the full-precision teacher under significant computational constraints.
翻译:[翻译摘要]
低比特量化使得图像复原网络能够部署在资源受限设备上,但引入的舍入噪声会不成比例地降低边缘和精细纹理等高频率区域的性能。现有知识蒸馏方法对所有空间位置施加均匀的蒸馏信号,忽略了不同图像区域的重建难度差异。为此,我们提出SPARK(面向知识蒸馏的空间策略驱动自适应强化学习)框架,该框架通过轻量级强化学习策略网络自适应地分配蒸馏注意力。在每个训练步骤中,困难度特征提取器计算四个信号(拉普拉斯方差、像素方差、学生网络重建误差及师生知识差距),这些信号输入紧凑的策略型卷积神经网络以产生随机空间权重图,从而在量化感知训练期间调节知识蒸馏损失。SPARK具有图像复原任务无关性、不增加推理开销,并可在无需修改架构的情况下集成至任何现有量化感知训练流程。基准数据集实验表明,在显著计算约束下,SPARK在多种学生网络架构上持续优于后训练量化、量化感知训练及最先进的知识蒸馏方法,实现最接近全精度教师网络的重建质量。