Don't Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive Bayes With Attention

Utilizing Large Language Models (LLM) as chatbots in diverse business scenarios often presents the challenge of maintaining topic continuity. Abrupt shifts in topics can lead to poor user experiences and inefficient utilization of computational resources. In this paper, we present a topic continuity model aimed at assessing whether a response aligns with the initial conversation topic. Our model is built upon the expansion of the corresponding natural language understanding (NLU) model into quantifiable terms using a Naive Bayes approach. Subsequently, we have introduced an attention mechanism and logarithmic nonlinearity to enhance its capability to capture topic continuity. This approach allows us to convert the NLU model into an interpretable analytical formula. In contrast to many NLU models constrained by token limits, our proposed model can seamlessly handle conversations of any length with linear time complexity. Furthermore, the attention mechanism significantly improves the model's ability to identify topic continuity in complex conversations. According to our experiments, our model consistently outperforms traditional methods, particularly in handling lengthy and intricate conversations. This unique capability offers us an opportunity to ensure the responsible and interpretable use of LLMs.

翻译：在多样化商业场景中将大型语言模型（LLM）部署为聊天机器人时，常面临保持主题连续性的挑战。话题的突然转换可能导致用户体验下降及计算资源利用效率低下。本文提出一种主题连续性模型，旨在评估回复内容是否与初始对话主题保持一致。该模型基于朴素贝叶斯方法将对应的自然语言理解（NLU）模型扩展为可量化表达，进而引入注意力机制与对数非线性变换以增强其捕捉主题连续性的能力。此方法使我们能够将NLU模型转化为可解释的解析公式。相较于受限于标记长度的传统NLU模型，我们提出的模型能以线性时间复杂度无缝处理任意长度的对话。此外，注意力机制显著提升了模型在复杂对话中识别主题连续性的能力。实验表明，该模型在各项指标上持续优于传统方法，尤其在处理冗长复杂的对话场景中表现突出。这一独特能力为我们确保LLM的可解释性与负责任使用提供了新的可能。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。