As large language models increasingly mediate firm - customer interactions, firms face a tradeoff: the most capable models perform well but are costly and difficult to control at scale. Existing knowledge distillation methods address this challenge by training weaker, deployable models to imitate frontier outputs; however, such open-loop approaches are poorly suited to interactive, multi-turn settings where responses must be sequenced coherently across conversational states. We propose a shift in what knowledge is distilled - from output imitation to contextual guidance. We develop a framework in which a superior teacher model constructs a reusable library of strategic textual guidance for particular scenarios likely to be encountered by the student. When deployed, the student retrieves the context-specific guidance at inference time, enabling adaptive behavior without retraining. Using customer-service interactions, we show that this approach improves service quality and customer satisfaction relative to standard fine-tuning while maintaining alignment with firm policies. The results position inference-time textual guidance as a scalable and controllable approach to distillation for interactive AI agents in marketing settings.
翻译:随着大型语言模型日益成为企业与客户互动的中介,企业面临着一个权衡:能力最强的模型表现优异,但成本高昂且难以大规模控制。现有的知识蒸馏方法通过训练较弱的可部署模型来模仿前沿模型的输出以应对这一挑战;然而,这种开环方法并不适合交互式、多轮次的场景,因为在此类场景中,响应必须在不同的对话状态之间连贯地序列化。我们提出了一种蒸馏知识的转变——从输出模仿转向情境引导。我们开发了一个框架,其中优秀的教师模型为特定场景构建一个可重用的策略性文本引导库,这些场景是学生模型可能遇到的。部署时,学生模型在推理时检索特定情境的引导,从而实现自适应行为而无需重新训练。通过客户服务交互的案例,我们证明相较于标准的微调方法,此方法在保持与企业政策一致性的同时,提高了服务质量和客户满意度。研究结果表明,推理时文本引导是一种可扩展且可控的方法,适用于营销场景中交互式人工智能代理的蒸馏。