One of the challenges of natural language understanding is to deal with the subjectivity of sentences, which may express opinions and emotions that add layers of complexity and nuance. Sentiment analysis is a field that aims to extract and analyze these subjective elements from text, and it can be applied at different levels of granularity, such as document, paragraph, sentence, or aspect. Aspect-based sentiment analysis is a well-studied topic with many available data sets and models. However, there is no clear definition of what makes a sentence difficult for aspect-based sentiment analysis. In this paper, we explore this question by conducting an experiment with three data sets: "Laptops", "Restaurants", and "MTSC" (Multi-Target-dependent Sentiment Classification), and a merged version of these three datasets. We study the impact of domain diversity and syntactic diversity on difficulty. We use a combination of classifiers to identify the most difficult sentences and analyze their characteristics. We employ two ways of defining sentence difficulty. The first one is binary and labels a sentence as difficult if the classifiers fail to correctly predict the sentiment polarity. The second one is a six-level scale based on how many of the top five best-performing classifiers can correctly predict the sentiment polarity. We also define 9 linguistic features that, combined, aim at estimating the difficulty at sentence level.
翻译:自然语言理解的挑战之一在于处理句子的主观性,这类句子可能表达观点与情感,从而增加复杂性和细微差别。情感分析旨在从文本中提取并分析这些主观要素,可应用于文档、段落、句子或方面等不同粒度层面。面向方面情感分析是一个研究充分且具备丰富数据集与模型的课题。然而,目前缺乏对何种句子在方面情感分析中构成难度的明确定义。本文通过实验探究该问题,采用"笔记本电脑"、"餐厅"及"MTSC"(多目标依赖情感分类)三个数据集及其合并版本,研究领域多样性与句法多样性对难度的影响。我们结合多种分类器识别最困难句子并分析其特征,采用两种方式定义句子难度:其一为二分类法,当分类器无法正确预测情感极性时标记该句子为困难;其二是基于前五名最佳性能分类器正确预测情感极性的数量建立六级量表。此外,我们定义了九种语言学特征,旨在从句子层面综合评估难度。