Our work addresses the problem of unsupervised Aspect Category Detection using a small set of seed words. Recent works have focused on learning embedding spaces for seed words and sentences to establish similarities between sentences and aspects. However, aspect representations are limited by the quality of initial seed words, and model performances are compromised by noise. To mitigate this limitation, we propose a simple framework that automatically enhances the quality of initial seed words and selects high-quality sentences for training instead of using the entire dataset. Our main concepts are to add a number of seed words to the initial set and to treat the task of noise resolution as a task of augmenting data for a low-resource task. In addition, we jointly train Aspect Category Detection with Aspect Term Extraction and Aspect Term Polarity to further enhance performance. This approach facilitates shared representation learning, allowing Aspect Category Detection to benefit from the additional guidance offered by other tasks. Extensive experiments demonstrate that our framework surpasses strong baselines on standard datasets.
翻译:摘要:我们的工作利用少量种子词解决了无监督方面类别检测问题。近期研究主要聚焦于为种子词和句子学习嵌入空间,以建立句子与方面之间的相似性。然而,方面表示受限于初始种子词的质量,且模型性能因噪声而受损。为缓解这一局限,我们提出了一种简单框架,该框架能自动提升初始种子词的质量,并选择高质量句子进行训练,而非使用整个数据集。我们的核心思想是在初始集合中补充若干种子词,并将噪声处理视为低资源任务的数据增强任务。此外,我们通过联合训练方面类别检测、方面术语提取及方面术语极性,进一步提升性能。该方式促进了共享表示学习,使方面类别检测能从其他任务提供的额外指导中获益。大量实验表明,我们的框架在标准数据集上超越了强基线模型。