We present a novel method for mining opinions from text collections using generative language models trained on data collected from different populations. We describe the basic definitions, methodology and a generic algorithm for opinion insight mining. We demonstrate the performance of our method in an experiment where a pre-trained generative model is fine-tuned using specifically tailored content with unnatural and fully annotated opinions. We show that our approach can learn and transfer the opinions to the semantic classes while maintaining the proportion of polarisation. Finally, we demonstrate the usage of an insight mining system to scale up the discovery of opinion insights from a real text corpus.
翻译:我们提出了一种新颖的方法,利用在不同群体数据上训练的生成式语言模型从文本集合中挖掘意见。本文阐述了基本定义、方法论以及用于意见洞察挖掘的通用算法。我们通过实验展示了该方法的效果:使用包含非自然语言且经过完整标注的意见的特定定制内容,对预训练生成式模型进行微调。结果表明,我们的方法能够在保持极化比例的同时,将意见学习并迁移至语义类别。最后,我们展示了如何利用洞察挖掘系统从真实文本语料库中规模化发掘意见洞察。