Review score prediction requires review text understanding, a critical real-world application of natural language processing. Due to dissimilar text domains in product reviews, a common practice is fine-tuning BERT models upon reviews of differing domains. However, there has not yet been an empirical study of cross-domain behaviors of BERT models in the various tasks of product review understanding. In this project, we investigate text classification BERT models fine-tuned on single-domain and multi-domain Amazon review data. In our findings, though single-domain models achieved marginally improved performance on their corresponding domain compared to multi-domain models, multi-domain models outperformed single-domain models when evaluated on multi-domain data, single-domain data the single-domain model was not fine-tuned on, and on average when considering all tests. Though slight increases in accuracy can be achieved through single-domain model fine-tuning, computational resources and costs can be reduced by utilizing multi-domain models that perform well across domains.
翻译:评论分数预测需要理解评论文本,这是自然语言处理中一个关键的实际应用。由于产品评论中文本领域不同,常见做法是在不同领域的评论上微调BERT模型。然而,目前尚缺乏关于BERT模型在产品评论理解各项任务中跨领域行为的实证研究。在本项目中,我们探究了在单领域和多领域亚马逊评论数据上微调后的文本分类BERT模型。我们的研究发现,尽管单领域模型在其对应领域上的性能略优于多领域模型,但在多领域数据、单领域模型未经过微调的单领域数据以及所有测试的平均结果上,多领域模型的表现均优于单领域模型。虽然通过单领域模型微调可以实现小幅度的准确率提升,但利用能够在多个领域表现良好的多领域模型可以减少计算资源和成本。