Investigations into Aspect-Based Sentiment Analysis (ABSA) for Korean industrial reviews are notably lacking in the existing literature. Our research proposes an intuitive and effective framework for ABSA in low-resource languages such as Korean. It optimizes prediction labels by integrating translated benchmark and unlabeled Korean data. Using a model fine-tuned on translated data, we pseudo-labeled the actual Korean NLI set. Subsequently, we applied LaBSE and \MSP{}-based filtering to this pseudo-NLI set as implicit feature, enhancing Aspect Category Detection and Polarity determination through additional training. Incorporating dual filtering, this model bridged dataset gaps, achieving positive results in Korean ABSA with minimal resources. Through additional data injection pipelines, our approach aims to utilize high-resource data and construct effective models within communities, whether corporate or individual, in low-resource language countries. Compared to English ABSA, our framework showed an approximately 3\% difference in F1 scores and accuracy. We release the dataset and our code for Korean ABSA, at this link.
翻译:现有文献中针对韩语工业评论的方面级情感分析研究明显不足。本研究提出了一种适用于韩语等低资源语言的直观有效方面级情感分析框架。该框架通过整合翻译基准数据和未标注韩语数据来优化预测标签。利用在翻译数据上微调的模型,我们对实际韩语自然语言推理数据集进行伪标注。随后,我们基于LaBSE和\MSP{}的过滤方法将此伪自然语言推理集作为隐式特征,通过额外训练增强方面类别检测与情感极性判定。该模型结合双重过滤机制,弥合了数据集间的差异,在资源有限条件下实现了韩语方面级情感分析的积极成果。通过附加数据注入流程,我们的方法旨在利用高资源数据,为低资源语言国家的企业或个人社群构建有效模型。与英语方面级情感分析相比,本框架在F1分数和准确率上显示出约3%的差异。我们在该链接发布了韩语方面级情感分析数据集及代码。