The rapid digitalization of customer service has intensified the demand for conversational agents capable of providing accurate and natural interactions. In the Algerian context, this is complicated by the linguistic complexity of Darja, a dialect characterized by non-standardized orthography, extensive code-switching with French, and the simultaneous use of Arabic and Latin (Arabizi) scripts. This paper introduces DziriBOT, a hybrid intelligent conversational agent specifically engineered to overcome these challenges. We propose a multi-layered architecture that integrates specialized Natural Language Understanding (NLU) with Retrieval-Augmented Generation (RAG), allowing for both structured service flows and dynamic, knowledge-intensive responses grounded in curated enterprise documentation. To address the low-resource nature of Darja, we systematically evaluate three distinct approaches: a sparse-feature Rasa pipeline, classical machine learning baselines, and transformer-based fine-tuning. Our experimental results demonstrate that the fine-tuned DziriBERT model achieves state-of-the-art performance. These results significantly outperform traditional baselines, particularly in handling orthographic noise and rare intents. Ultimately, DziriBOT provides a robust, scalable solution that bridges the gap between formal language models and the linguistic realities of Algerian users, offering a blueprint for dialect-aware automation in the regional market.
翻译:客户服务的快速数字化加剧了对能够提供准确、自然交互的对话代理的需求。在阿尔及利亚语境下,这一需求因Darja方言的语言复杂性而变得尤为复杂。该方言具有非标准化的正字法、与法语的大量语码转换以及阿拉伯字母与拉丁字母(Arabizi)书写形式并存的特点。本文介绍了DziriBOT,一种专门为克服这些挑战而设计的混合智能对话代理。我们提出了一种多层架构,将专用自然语言理解(NLU)与检索增强生成(RAG)相结合,从而既能支持结构化的服务流程,又能基于精心整理的企业文档生成动态的、知识密集型的响应。针对Darja方言资源匮乏的特性,我们系统评估了三种不同方法:基于稀疏特征的Rasa流水线、经典机器学习基线以及基于Transformer的微调方法。我们的实验结果表明,经过微调的DziriBERT模型实现了最先进的性能。这些结果显著优于传统基线方法,尤其是在处理拼写噪声和罕见意图方面。最终,DziriBOT提供了一个稳健、可扩展的解决方案,弥合了正式语言模型与阿尔及利亚用户语言现实之间的差距,为区域市场中具有方言感知能力的自动化系统提供了蓝图。