Modern machine learning techniques in the natural language processing domain can be used to automatically generate scripts for goal-oriented dialogue systems. The current article presents a general framework for studying the automatic generation of scripts for goal-oriented dialogue systems. A method for preprocessing dialog data sets in JSON format is described. A comparison is made of two methods for extracting user intent based on BERTopic and latent Dirichlet allocation. A comparison has been made of two implemented algorithms for classifying statements of users of a goal-oriented dialogue system based on logistic regression and BERT transformer models. The BERT transformer approach using the bert-base-uncased model showed better results for the three metrics Precision (0.80), F1-score (0.78) and Matthews correlation coefficient (0.74) in comparison with other methods.
翻译:现代自然语言处理领域的机器学习技术可用于自动生成目标导向对话系统的脚本。本文提出了一个用于研究目标导向对话系统脚本自动生成的通用框架,描述了JSON格式对话数据集的预处理方法,比较了基于BERTopic和潜在狄利克雷分配的两种用户意图提取方法,并对比了基于逻辑回归和BERT transformer模型的两种目标导向对话系统用户话语分类算法的实现结果。实验表明,采用bert-base-uncased模型的BERT transformer方法在精确率(0.80)、F1分数(0.78)和马修斯相关系数(0.74)三项指标上均优于其他方法。