The efficiency of natural language processing has improved dramatically with the advent of machine learning models, particularly neural network-based solutions. However, some tasks are still challenging, especially when considering specific domains. In this paper, we present a cloud-based system that can extract insights from customer reviews using machine learning methods integrated into a pipeline. For topic modeling, our composite model uses transformer-based neural networks designed for natural language processing, vector embedding-based keyword extraction, and clustering. The elements of our model have been integrated and further developed to meet better the requirements of efficient information extraction, topic modeling of the extracted information, and user needs. Furthermore, our system can achieve better results than this task's existing topic modeling and keyword extraction solutions. Our approach is validated and compared with other state-of-the-art methods using publicly available datasets for benchmarking.
翻译:自然语言处理的效率随着机器学习模型,特别是基于神经网络的解决方案的出现而得到了显著提升。然而,某些任务仍具有挑战性,尤其是在特定领域。本文介绍了一个基于云计算的系统,该系统通过集成到流水线中的机器学习方法从客户评论中提取洞察。对于主题建模,我们的复合模型使用了专为自然语言处理设计的基于Transformer的神经网络、基于向量嵌入的关键词提取以及聚类。我们模型的各组成部分已被集成并进一步开发,以更好地满足高效信息提取、提取信息的主题建模以及用户需求。此外,我们的系统能比该任务现有的主题建模和关键词提取解决方案取得更优结果。通过使用公开可用的数据集进行基准测试,我们的方法得到了验证,并与其它最先进的方法进行了比较。