This paper introduces a novel Bayesian method for measuring the degree of association between categorical variables. The method is grounded in the formal definition of variable independence and was implemented using MCMC techniques. Unlike existing methods, this approach does not assume prior knowledge of the total number of occurrences for any category, making it particularly well-suited for applications like sentiment analysis. We applied the method to a dataset comprising 4,613 tweets written in Portuguese, each annotated for 30 possibly overlapping emotional categories. Through this analysis, we identified pairs of emotions that exhibit associations and mutually exclusive pairs. Furthermore, the method identifies hierarchical relations between categories, a feature observed in our data, and was used to cluster emotions into basic level groups.
翻译:本文提出一种基于贝叶斯方法的新型类别关联度测量方法。该方法以变量独立性的形式化定义为理论基础,并采用MCMC技术实现。与现有方法不同,本方法无需预先假设任何类别的总出现次数,特别适用于情感分析等应用场景。我们将该方法应用于包含4613条葡萄牙语推文的数据集,每条推文标注了30个可能重叠的情感类别。通过此项分析,我们识别出具有关联性的情感对以及互斥的情感对。此外,该方法还能识别数据中观察到的类别层级关系,并用于将情感聚类为基本层级的情感组。