Understanding customer feedback is becoming a necessity for companies to identify problems and improve their products and services. Text classification and sentiment analysis can play a major role in analyzing this data by using a variety of machine and deep learning approaches. In this work, different transformer-based models are utilized to explore how efficient these models are when working with a German customer feedback dataset. In addition, these pre-trained models are further analyzed to determine if adapting them to a specific domain using unlabeled data can yield better results than off-the-shelf pre-trained models. To evaluate the models, two downstream tasks from the GermEval 2017 are considered. The experimental results show that transformer-based models can reach significant improvements compared to a fastText baseline and outperform the published scores and previous models. For the subtask Relevance Classification, the best models achieve a micro-averaged $F1$-Score of 96.1 % on the first test set and 95.9 % on the second one, and a score of 85.1 % and 85.3 % for the subtask Polarity Classification.
翻译:理解客户反馈已成为企业识别问题并改进产品及服务的必要条件。文本分类与情感分析通过运用多种机器学习和深度学习方法,在分析这类数据中扮演着重要角色。本研究使用多种基于Transformer的模型,探讨这些模型在处理德语客户反馈数据集时的效率。此外,进一步分析这些预训练模型,以确定通过未标注数据将其适应于特定领域是否能比直接使用现成预训练模型取得更好的效果。为评估模型性能,选取了GermEval 2017中的两项下游任务。实验结果表明,与fastText基线相比,基于Transformer的模型能实现显著提升,并超越已公布得分及先前模型。在相关性分类子任务中,最优模型在第一个测试集上的微观平均$F1$分数达到96.1%,第二个测试集上为95.9%;在极性分类子任务中,分数分别为85.1%和85.3%。