As large dialogue models become commonplace in practice, the problems surrounding high compute requirements for training, inference and larger memory footprint still persists. In this work, we present AUTODIAL, a multi-task dialogue model that addresses the challenges of deploying dialogue model. AUTODIAL utilizes parallel decoders to perform tasks such as dialogue act prediction, domain prediction, intent prediction, and dialogue state tracking. Using classification decoders over generative decoders allows AUTODIAL to significantly reduce memory footprint and achieve faster inference times compared to existing generative approach namely SimpleTOD. We demonstrate that AUTODIAL provides 3-6x speedups during inference while having 11x fewer parameters on three dialogue tasks compared to SimpleTOD. Our results show that extending current dialogue models to have parallel decoders can be a viable alternative for deploying them in resource-constrained environments.
翻译:随着大型对话模型在实际应用中日益普及,其训练与推理过程中高计算需求及更大内存占用等问题依然存在。本文提出AUTODIAL——一种多任务对话模型,旨在解决对话模型部署中的挑战。AUTODIAL采用并行解码器执行对话行为预测、领域预测、意图预测及对话状态追踪等任务。相较于现有生成式方法(如SimpleTOD),基于分类解码器的设计使AUTODIAL显著降低内存占用并实现更快的推理速度。实验表明,在三个对话任务中,AUTODIAL在推理速度上提升3-6倍,同时参数量仅为SimpleTOD的1/11。研究结果证明,扩展当前对话模型以引入并行解码器,可成为其在资源受限环境中部署的可行替代方案。