As large dialogue models become commonplace in practice, the problems surrounding high compute requirements for training, inference and larger memory footprint still persists. In this work, we present AUTODIAL, a multi-task dialogue model that addresses the challenges of deploying dialogue model. AUTODIAL utilizes parallel decoders to perform tasks such as dialogue act prediction, domain prediction, intent prediction, and dialogue state tracking. Using classification decoders over generative decoders allows AUTODIAL to significantly reduce memory footprint and achieve faster inference times compared to existing generative approach namely SimpleTOD. We demonstrate that AUTODIAL provides 3-6x speedups during inference while having 11x fewer parameters on three dialogue tasks compared to SimpleTOD. Our results show that extending current dialogue models to have parallel decoders can be a viable alternative for deploying them in resource-constrained environments.
翻译:随着大型对话模型在实践中变得普遍,训练、推理所需的高计算资源以及更大的内存占用问题依然存在。本文提出AUTODIAL,一种多任务对话模型,旨在解决对话模型部署中的挑战。AUTODIAL采用并行解码器执行对话行为预测、领域预测、意图预测及对话状态跟踪等任务。通过使用分类解码器替代生成解码器,相比现有生成式方法(如SimpleTOD),AUTODIAL显著降低了内存占用并实现了更快的推理速度。实验表明,在三个对话任务上,AUTODIAL的推理速度提升3-6倍,同时参数量仅为SimpleTOD的1/11。我们的结果表明,将当前对话模型扩展为并行解码器结构,可成为在资源受限环境中部署对话模型的可行替代方案。