This paper aims to construct a linguistic resource of Korean Multiword Expressions for Feature-Based Sentiment Analysis (FBSA): DECO-MWE. Dealing with multiword expressions (MWEs) has been a critical issue in FBSA since many constructs reveal lexical idiosyncrasy. To construct linguistic resources of sentiment MWEs efficiently, we utilize the Local Grammar Graph (LGG) methodology: DECO-MWE is formalized as a Finite-State Transducer that represents lexical-syntactic restrictions on MWEs. In this study, we built a corpus of cosmetics review texts, which show particularly frequent occurrences of MWEs. Based on an empirical examination of the corpus, four types of MWEs have been distinguished. The DECO-MWE thus covers the following four categories: Standard Polarity MWEs (SMWEs), Domain-Dependent Polarity MWEs (DMWEs), Compound Named Entity MWEs (EMWEs) and Compound Feature MWEs (FMWEs). The retrieval performance of the DECO-MWE shows 0.806 f-measure in the test corpus. This study brings a twofold outcome: first, a sizeable general-purpose polarity MWE lexicon, which may be broadly used in FBSA; second, a finite-state methodology adopted in this study to treat domain-dependent MWEs such as idiosyncratic polarity expressions, named entity expressions or feature expressions, and which may be reused in describing linguistic properties of other corpus domains.
翻译:本文旨在构建面向基于特征的情感分析(FBSA)的韩语多词表达语言资源:DECO-MWE。由于许多结构体呈现出词汇特异性,处理多词表达(MWE)一直是FBSA中的关键问题。为高效构建情感多词表达语言资源,我们采用局部语法图(LGG)方法论:将DECO-MWE形式化为表示多词表达词汇-句法限制的有限状态转换器。本研究构建了化妆品评论文本语料库,该类文本中多词表达出现尤为频繁。基于对语料库的实证分析,我们区分出四类多词表达。DECO-MWE涵盖以下四个类别:标准极性多词表达(SMWE)、领域依赖极性多词表达(DMWE)、复合命名实体多词表达(EMWE)以及复合特征多词表达(FMWE)。DECO-MWE在测试语料库中的检索性能达到0.806的F值。本研究带来双重成果:其一,构建了规模可观且可用于FBSA通用场景的极性多词表达词典;其二,所采用的有限状态方法可处理领域依赖型多词表达(如特异极性表达、命名实体表达或特征表达),该方法可复用于其他语料库领域的语言属性描述。