User-generated content from social media is produced in many languages, making it technically challenging to compare the discussed themes from one domain across different cultures and regions. It is relevant for domains in a globalized world, such as market research, where people from two nations and markets might have different requirements for a product. We propose a simple, modern, and effective method for building a single topic model with sentiment analysis capable of covering multiple languages simultanteously, based on a pre-trained state-of-the-art deep neural network for natural language understanding. To demonstrate its feasibility, we apply the model to newspaper articles and user comments of a specific domain, i.e., organic food products and related consumption behavior. The themes match across languages. Additionally, we obtain an high proportion of stable and domain-relevant topics, a meaningful relation between topics and their respective textual contents, and an interpretable representation for social media documents. Marketing can potentially benefit from our method, since it provides an easy-to-use means of addressing specific customer interests from different market regions around the globe. For reproducibility, we provide the code, data, and results of our study.
翻译:社交媒体上用户生成的内容以多种语言产生,这使得跨不同文化和地区比较某一领域的讨论主题在技术上具有挑战性。在全球化世界中,这对市场研究等领域至关重要,因为来自两个国家和市场的人们可能对产品有不同需求。我们提出了一种简单、现代且有效的方法,基于当前最先进的用于自然语言理解的预训练深度神经网络,构建一个能够同时覆盖多种语言且带有情感分析功能的单一主题模型。为验证其可行性,我们将该模型应用于特定领域的报纸文章和用户评论,即有机食品及其相关消费行为。主题在各语言间得以匹配。此外,我们获得了高比例的稳定且与领域相关的主题、主题与其各自文本内容之间的有意义关系,以及社交媒体文档的可解释性表示。我们的方法可能使市场营销受益,因为它提供了一种易于使用的手段来处理来自全球不同市场区域的特定客户兴趣。为保证可重复性,我们提供了研究中的代码、数据和结果。