DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

Distillation enables compact Vision-Language Models (VLMs) to obtain strong reasoning capabilities, yet the prompts driving this process are typically chosen via simple heuristics or aggregated from off-the-shelf datasets. We reveal a critical inefficiency in this approach: up to 69% of the prompts in standard chart / document reasoning datasets are effectively zero-delta, meaning the teacher and student already induce the exact same answer distribution. Training on these prompts provides minimal learning signal, causing student improvement to rapidly saturate regardless of data scale. To escape the zero-delta trap, we return to first principles: distillation fundamentally minimizes distributional divergence, and thus a prompt is valuable only if it exposes a functional capability gap between the teacher and student. We quantify this gap through answer divergence ($Δ$), demonstrating that non-zero divergence is critical for effective scaling. Building on this insight, we propose a staged synthesis pipeline that repurposes existing datasets as seeds, actively targeting student failure modes to produce better prompts. The result is DeltaPrompts, a diverse dataset of 200k synthetic, high-divergence reasoning problems. We evaluate DeltaPrompts across three distinct settings: on-policy distillation with the target teacher-student pair, transfer to a novel model family without regenerating the data, and off-policy fine-tuning of a non-reasoning model. Across all scenarios, DeltaPrompts drives substantial gains, yielding up to 15% relative improvement even on top of a highly-optimized reasoning model (e.g., Qwen3-VL-8B-Thinking) -- averaged over 10 benchmarks spanning chart, document and perception-centric reasoning.

翻译：蒸馏技术使紧凑型视觉语言模型（VLM）具备强大推理能力，但驱动该过程的提示通常通过简单启发式规则选择或从现成数据集聚合而成。本文揭示该方法的关键效率问题：标准图表/文档推理数据集中高达69%的提示本质上是零Delta的，即教师模型与学生模型已生成完全相同的答案分布。基于这些提示训练只能提供极微弱的学习信号，导致学生模型的性能改进无论数据规模如何都会迅速饱和。为逃离零Delta陷阱，我们回归基本原理：蒸馏本质上是最小化分布散度，因此提示的价值仅体现于其能否暴露教师与学生之间的功能能力差距。我们通过答案散度（Δ）量化该差距，证明非零散度对有效扩展至关重要。基于此洞察，我们提出分阶段合成流水线，将现有数据集作为种子，主动针对学生模型失败模式生成更优提示。最终形成DeltaPrompts——包含20万条合成高散度推理问题的多样化数据集。我们在三种不同场景中评估DeltaPrompts：目标师生对的在线策略蒸馏、无需重新生成数据即可迁移至新型模型家族、以及非推理模型的离线策略微调。在所有场景中，DeltaPrompts均带来显著提升，即便在高度优化的推理模型（如Qwen3-VL-8B-Thinking）基础上仍能实现最高15%的相对改进——该结果基于涵盖图表、文档与感知中心推理的10个基准测试的平均值。