In recent years, we are seeing considerable interest in conversational agents with the rise of large language models (LLMs). Although they offer considerable advantages, LLMs also present significant risks, such as hallucination, which hinder their widespread deployment in industry. Moreover, low-resource languages such as African ones are still underrepresented in these systems limiting their performance in these languages. In this paper, we illustrate a more classical approach based on modular architectures of Task-oriented Dialog Systems (ToDS) offering better control over outputs. We propose a chatbot generation engine based on the Rasa framework and a robust methodology for projecting annotations onto the Wolof language using an in-house machine translation system. After evaluating a generated chatbot trained on the Amazon Massive dataset, our Wolof Intent Classifier performs similarly to the one obtained for French, which is a resource-rich language. We also show that this approach is extensible to other low-resource languages, thanks to the intent classifier's language-agnostic pipeline, simplifying the design of chatbots in these languages.
翻译:近年来,随着大语言模型(LLMs)的兴起,对话智能体引起了广泛关注。尽管大语言模型具有显著优势,但也存在幻觉等重大风险,阻碍了其在工业界的广泛部署。此外,非洲语言等低资源语言在这些系统中代表性仍然不足,限制了其在这些语言上的性能。本文阐述了一种基于模块化架构的更为经典的任务导向对话系统(ToDS)方法,该方法能更好地控制输出。我们提出了一种基于Rasa框架的聊天机器人生成引擎,以及一种利用内部机器翻译系统将标注信息映射至沃洛夫语的稳健方法。在对基于Amazon Massive数据集训练的生成式聊天机器人进行评估后,我们的沃洛夫语意图分类器取得了与资源丰富的法语分类器相当的性能。我们还表明,得益于意图分类器的语言无关处理流程,该方法可扩展至其他低资源语言,从而简化了这些语言聊天机器人的设计。