AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the unintended leakage of private information to external parties. We propose training models to follow instructions not only in the final answer, but also in reasoning traces, potentially under different constraints. We hypothesize that improving their instruction following abilities in the reasoning traces can improve their privacy-preservation skills. To demonstrate this, we fine-tune models on a new instruction-following dataset with explicit restrictions on reasoning traces. We further introduce a generation strategy that decouples reasoning and answer generation using separate LoRA adapters. We evaluate our approach on six models from two model families, ranging from 1.7B to 14B parameters, across two instruction-following benchmarks and two privacy benchmarks. Our method yields substantial improvements, achieving gains of up to 20.9 points in instruction-following performance and up to 51.9 percentage points on privacy benchmarks. These improvements, however, can come at the cost of task utility, due to the trade-off between reasoning performance and instruction-following abilities. Overall, our results show that improving instruction-following behavior in reasoning models can significantly enhance privacy, suggesting a promising direction for the development of future privacy-aware agents. Our code and data are available at https://github.com/UKPLab/arxiv2026-controllable-reasoning-models
翻译:由推理模型驱动的AI智能体需要访问敏感用户数据。然而,其推理轨迹难以控制,可能导致私人信息意外泄露给外部实体。我们提出训练模型使其不仅在最终答案中遵循指令,同时在推理轨迹中也能遵循指令,且可能在不同约束条件下实现。我们假设提升模型在推理轨迹中的指令遵循能力可增强其隐私保护技能。为验证此假设,我们在一个对推理轨迹设有明确限制的新指令遵循数据集上对模型进行微调。我们进一步提出一种生成策略,通过使用独立的LoRA适配器将推理过程与答案生成解耦。我们在两个模型家族的六个模型上评估该方法,模型参数规模从17亿到140亿不等,测试涵盖两个指令遵循基准和两个隐私基准。我们的方法带来了显著改进,在指令遵循性能上最高提升20.9分,在隐私基准上最高提升51.9个百分点。然而,由于推理性能与指令遵循能力间的权衡,这些改进可能以任务效用为代价。总体而言,我们的结果表明,提升推理模型的指令遵循行为能显著增强隐私保护,这为未来隐私感知智能体的发展指明了有前景的方向。我们的代码与数据公开于https://github.com/UKPLab/arxiv2026-controllable-reasoning-models