Background: The adoption of chatbots into software development tasks has become increasingly popular among practitioners, driven by the advantages of cost reduction and acceleration of the software development process. Chatbots understand users' queries through the Natural Language Understanding component (NLU). To yield reasonable performance, NLUs have to be trained with extensive, high-quality datasets, that express a multitude of ways users may interact with chatbots. However, previous studies show that creating a high-quality training dataset for software engineering chatbots is expensive in terms of both resources and time. Aims: Therefore, in this paper, we present an automated transformer-based approach to augment software engineering chatbot datasets. Method: Our approach combines traditional natural language processing techniques with the BART transformer to augment a dataset by generating queries through synonym replacement and paraphrasing. We evaluate the impact of using the augmentation approach on the Rasa NLU's performance using three software engineering datasets. Results: Overall, the augmentation approach shows promising results in improving the Rasa's performance, augmenting queries with varying sentence structures while preserving their original semantics. Furthermore, it increases Rasa's confidence in its intent classification for the correctly classified intents. Conclusions: We believe that our study helps practitioners improve the performance of their chatbots and guides future research to propose augmentation techniques for SE chatbots.
翻译:背景:在降低成本和加速软件开发过程的优势驱动下,聊天机器人在软件开发任务中的应用日益受到从业者欢迎。聊天机器人通过自然语言理解组件理解用户查询。为获得合理性能,NLU必须使用表达用户与聊天机器人多种交互方式的大规模高质量数据集进行训练。然而,先前研究表明,为软件工程聊天机器人创建高质量训练数据集在资源和时间方面成本高昂。目标:为此,本文提出一种基于Transformer的自动化方法来增强软件工程聊天机器人数据集。方法:我们的方法结合传统自然语言处理技术与BART Transformer,通过同义词替换和复述生成查询来增强数据集。我们使用三个软件工程数据集评估了增强方法对Rasa NLU性能的影响。结果:总体而言,增强方法在提升Rasa性能方面展现出良好效果,能生成具有不同句子结构的查询同时保持原始语义。此外,该方法提高了Rasa对正确分类意图的置信度。结论:我们相信本研究有助于从业者提升聊天机器人性能,并为未来提出软件工程聊天机器人增强技术的研究提供指引。