Aspect Extraction (AE) is a key task in Aspect-Based Sentiment Analysis (ABSA), yet it remains difficult to apply in low-resource and code-switched contexts like Taglish, a mix of Tagalog and English commonly used in Filipino e-commerce reviews. This paper introduces a comprehensive AE pipeline designed for Taglish, combining rule-based, large language model (LLM)-based, and fine-tuning techniques to address both aspect identification and extraction. A Hierarchical Aspect Framework (HAF) is developed through multi-method topic modeling, along with a dual-mode tagging scheme for explicit and implicit aspects. For aspect identification, four distinct models are evaluated: a Rule-Based system, a Generative LLM (Gemini 2.0 Flash), and two Fine-Tuned Gemma-3 1B models trained on different datasets (Rule-Based vs. LLM-Annotated). Results indicate that the Generative LLM achieved the highest performance across all tasks (Macro F1 0.91), demonstrating superior capability in handling implicit aspects. In contrast, the fine-tuned models exhibited limited performance due to dataset imbalance and architectural capacity constraints. This work contributes a scalable and linguistically adaptive framework for enhancing ABSA in diverse, code-switched environments.
翻译:方面提取(Aspect Extraction, AE)是基于方面的情感分析(Aspect-Based Sentiment Analysis, ABSA)中的一项关键任务,但在低资源与语码转换(如塔加洛语与英语混合的Taglish,常见于菲律宾电子商务评论)的语境中,其应用仍具挑战性。本文提出一个专为Taglish设计的综合性AE流程,结合了基于规则、基于大语言模型(LLM)以及微调技术,以同时处理方面识别与提取任务。通过多方法主题建模,开发了一个层次化方面框架(Hierarchical Aspect Framework, HAF),并设计了一种用于显性与隐性方面的双模式标注方案。在方面识别任务中,评估了四种不同的模型:一个基于规则的系统、一个生成式LLM(Gemini 2.0 Flash),以及两个在不同数据集(基于规则标注与LLM标注)上微调的Gemma-3 1B模型。结果表明,生成式LLM在所有任务中均取得了最佳性能(宏平均F1值0.91),在处理隐性方面表现出卓越能力。相比之下,微调模型由于数据集不平衡与架构容量限制,性能表现有限。本研究为在多样化、语码转换环境中增强ABSA提供了一个可扩展且语言自适应的框架。