This paper introduces a Bayesian framework designed to measure the degree of association between categorical random variables. The method is grounded in the formal definition of variable independence and is implemented using Markov Chain Monte Carlo (MCMC) techniques. Unlike commonly employed techniques in Association Rule Learning, this approach enables a clear and precise estimation of confidence intervals and the statistical significance of the measured degree of association. We applied the method to non-exclusive emotions identified by annotators in 4,613 tweets written in Portuguese. This analysis revealed pairs of emotions that exhibit associations and mutually opposed pairs. Moreover, the method identifies hierarchical relations between categories, a feature observed in our data, and is utilized to cluster emotions into basic-level groups.
翻译:本文提出了一种贝叶斯框架,旨在测量分类随机变量之间的关联程度。该方法基于变量独立性的形式定义,并采用马尔可夫链蒙特卡洛(MCMC)技术实现。与关联规则学习中常用的技术不同,本方法能够清晰且精确地估计置信区间以及所测关联程度的统计显著性。我们将该方法应用于标注者对4613条葡萄牙语推文中识别的非排他性情感。分析揭示了表现出关联性的情感对以及相互对立的情感对。此外,该方法还能识别类别之间的层级关系——这一特性在我们的数据中得以体现,并用于将情感聚类为基本层次组。