The surge of user-generated online content presents a wealth of insights into customer preferences and market trends. However, the highly diverse, complex, and context-rich nature of such contents poses significant challenges to traditional opinion mining approaches. To address this, we introduce Online Opinion Mining Benchmark (OOMB), a novel dataset and evaluation protocol designed to assess the ability of large language models (LLMs) to mine opinions effectively from diverse and intricate online environments. OOMB provides extensive (entity, feature, opinion) tuple annotations and a comprehensive opinion-centric summary that highlights key opinion topics within each content, thereby enabling the evaluation of both the extractive and abstractive capabilities of models. Through our proposed benchmark, we conduct a comprehensive analysis of which aspects remain challenging and where LLMs exhibit adaptability, to explore whether they can effectively serve as opinion miners in realistic online scenarios. This study lays the foundation for LLM-based opinion mining and discusses directions for future research in this field.
翻译:用户生成在线内容的激增为洞察客户偏好和市场趋势提供了丰富的信息。然而,此类内容高度多样化、复杂且富含上下文的特点,对传统意见挖掘方法构成了重大挑战。为此,我们引入了在线意见挖掘基准(OOMB),这是一个新颖的数据集和评估协议,旨在评估大型语言模型(LLMs)从多样且复杂的在线环境中有效挖掘意见的能力。OOMB提供了广泛的(实体、特征、意见)三元组标注,以及一个以意见为中心的综合摘要,突出显示每条内容中的关键意见主题,从而能够评估模型的抽取式和生成式能力。通过我们提出的基准,我们对哪些方面仍具挑战性以及LLMs在何处表现出适应性进行了全面分析,以探讨它们是否能在现实的在线场景中有效地充当意见挖掘工具。本研究为基于LLM的意见挖掘奠定了基础,并讨论了该领域未来的研究方向。