Effectively analyzing online review data is essential across industries. However, many existing studies are limited to specific domains and languages or depend on supervised learning approaches that require large-scale labeled datasets. To address these limitations, we propose a multilingual, scalable, and unsupervised framework for cross-domain aspect detection. This framework is designed for multi-aspect labeling of multilingual and multi-domain review data. In this study, we apply automatic labeling to Korean and English review datasets spanning various domains and assess the quality of the generated labels through extensive experiments. Aspect category candidates are first extracted through clustering, and each review is then represented as an aspect-aware embedding vector using negative sampling. To evaluate the framework, we conduct multi-aspect labeling and fine-tune several pretrained language models to measure the effectiveness of the automatically generated labels. Results show that these models achieve high performance, demonstrating that the labels are suitable for training. Furthermore, comparisons with publicly available large language models highlight the framework's superior consistency and scalability when processing large-scale data. A human evaluation also confirms that the quality of the automatic labels is comparable to those created manually. This study demonstrates the potential of a robust multi-aspect labeling approach that overcomes limitations of supervised methods and is adaptable to multilingual, multi-domain environments. Future research will explore automatic review summarization and the integration of artificial intelligence agents to further improve the efficiency and depth of review analysis.
翻译:有效分析在线评论数据在各行业中至关重要。然而,现有研究多局限于特定领域和语言,或依赖于需要大规模标注数据集的监督学习方法。为克服这些局限,我们提出了一种用于跨领域属性检测的多语言、可扩展且无监督的框架。该框架专为多语言和多领域评论数据的多维度标注而设计。在本研究中,我们对涵盖多个领域的韩语和英语评论数据集实施自动标注,并通过大量实验评估所生成标签的质量。首先通过聚类提取候选属性类别,随后利用负采样将每条评论表示为属性感知的嵌入向量。为评估该框架,我们执行多维度标注任务,并对多个预训练语言模型进行微调以衡量自动生成标签的有效性。实验结果表明,这些模型均取得了优异性能,证明所生成的标签适用于模型训练。此外,与公开可用的大型语言模型的对比分析显示,本框架在处理大规模数据时具有更优的一致性和可扩展性。人工评估结果也证实,自动生成标签的质量与人工标注标签相当。本研究展示了一种鲁棒的多维度标注方法的潜力,该方法能够克服监督式方法的局限,并适应多语言、多领域环境。未来研究将探索自动评论摘要与人工智能代理的集成,以进一步提升评论分析的效率与深度。