When performing reasoning tasks with user-specific requirements, such as strict output formats, large language models (LLMs) often prioritize reasoning over adherence to detailed instructions. Fine-tuning LLMs on supervised datasets to address this is impractical due to high computational costs and limited parameter access. To tackle this, we propose DICE, a lightweight framework that guides small language models (SLMs) to refine LLMs' outputs through chain-of-thought (CoT) correction. DICE decouples the process by first prompting LLMs to generate natural language responses, then using trained SLMs to analyze and refine these outputs to meet structured output specifications. This framework preserves LLMs' broad knowledge and reasoning capabilities while ensuring the outputs conform to user demands. Specifically, DICE first constructs structured CoT adaptation datasets via a two-stage method and subsequently applies a dual-tuning strategy to fine-tune SLMs for generating structured outputs in an analyze-then-answer pattern. Experiments demonstrate that DICE improves the average format accuracy and content correctness of LLM outputs by 35.4\% and 29.4\%, respectively, achieving state-of-the-art (SOTA) performance over other competitive baselines.
翻译:在执行具有用户特定要求(如严格输出格式)的推理任务时,大型语言模型(LLMs)往往优先考虑推理过程而忽视对详细指令的遵循。通过在监督数据集上微调LLMs来解决此问题并不现实,因为计算成本高昂且参数访问受限。为此,我们提出DICE——一个轻量级框架,通过引导小型语言模型(SLMs)对LLMs的输出进行思维链(CoT)修正。DICE采用解耦流程:首先提示LLMs生成自然语言响应,随后使用训练后的SLMs对这些输出进行分析和精炼,以满足结构化输出规范。该框架在保留LLMs广泛知识储备和推理能力的同时,确保输出符合用户需求。具体而言,DICE首先通过两阶段方法构建结构化CoT适应数据集,随后采用双重调优策略微调SLMs,使其以“先分析后回答”的模式生成结构化输出。实验表明,DICE将LLM输出的平均格式准确率和内容正确率分别提升了35.4%和29.4%,相较于其他竞争基线实现了最先进的性能表现。