Modern machine learning techniques in the natural language processing domain can be used to automatically generate scripts for goal-oriented dialogue systems. The current article presents a general framework for studying the automatic generation of scripts for goal-oriented dialogue systems. A method for preprocessing dialog data sets in JSON format is described. A comparison is made of two methods for extracting user intent based on BERTopic and latent Dirichlet allocation. A comparison has been made of two implemented algorithms for classifying statements of users of a goal-oriented dialogue system based on logistic regression and BERT transformer models. The BERT transformer approach using the bert-base-uncased model showed better results for the three metrics Precision (0.80), F1-score (0.78) and Matthews correlation coefficient (0.74) in comparison with other methods.
翻译:现代自然语言处理领域的机器学习技术可用于自动生成目标导向对话系统的脚本。本文提出了一个通用框架,用于研究目标导向对话系统脚本的自动生成。描述了基于JSON格式的对话数据集预处理方法。比较了基于BERTopic与潜在狄利克雷分配两种用户意图提取方法,并对比了基于逻辑回归与BERT transformer模型实现的两类目标导向对话系统用户语句分类算法。采用bert-base-uncased模型的BERT transformer方法在精确率(0.80)、F1分数(0.78)和马修斯相关系数(0.74)三项指标上均优于其他方法。