We present a hybrid approach to the automated measurement of vagueness and subjectivity in texts. We first introduce the expert system VAGO, we illustrate it on a small benchmark of fact vs. opinion sentences, and then test it on the larger French press corpus FreSaDa to confirm the higher prevalence of subjective markers in satirical vs. regular texts. We then build a neural clone of VAGO, based on a BERT-like architecture, trained on the symbolic VAGO scores obtained on FreSaDa. Using explainability tools (LIME), we show the interest of this neural version for the enrichment of the lexicons of the symbolic version, and for the production of versions in other languages.
翻译:我们提出了一种混合方法用于自动测量文本中的模糊性与主观性。首先介绍专家系统VAGO,并在一个小型事实与观点句子基准数据集上进行验证,随后在更大规模的法语新闻语料库FreSaDa上进行测试,以确认讽刺文本中主观标记的出现频率显著高于常规文本。接着,我们基于类BERT架构构建VAGO的神经克隆版本,并在FreSaDa上通过符号化VAGO评分进行训练。利用可解释性工具LIME,我们展示了这一神经版本在丰富符号化版本词典以及生成其他语言版本中的价值。