With the recent advances in social media, the use of NLP techniques in social media data analysis has become an emerging research direction. Business organizations can particularly benefit from such an analysis of social media discourse, providing an external perspective on consumer behavior. Some of the NLP applications such as intent detection, sentiment classification, text summarization can help FinTech organizations to utilize the social media language data to find useful external insights and can be further utilized for downstream NLP tasks. Particularly, a summary which highlights the intents and sentiments of the users can be very useful for these organizations to get an external perspective. This external perspective can help organizations to better manage their products, offers, promotional campaigns, etc. However, certain challenges, such as a lack of labeled domain-specific datasets impede further exploration of these tasks in the FinTech domain. To overcome these challenges, we design an unsupervised phrase-based summary generation from social media data, using 'Action-Object' pairs (intent phrases). We evaluated the proposed method with other key-phrase based summary generation methods in the direction of contextual information of various Reddit discussion threads, available in the different summaries. We introduce certain "Context Metrics" such as the number of Unique words, Action-Object pairs, and Noun chunks to evaluate the contextual information retrieved from the source text in these phrase-based summaries. We demonstrate that our methods significantly outperform the baseline on these metrics, thus providing a qualitative and quantitative measure of their efficacy. Proposed framework has been leveraged as a web utility portal hosted within Amex.
翻译:随着社交媒体的最新进展,自然语言处理技术在社交媒体数据分析中的应用已成为新兴研究方向。商业组织尤其能从这种社交媒体话语分析中受益,为消费者行为提供外部视角。意图检测、情感分类、文本摘要等NLP应用有助于金融科技组织利用社交媒体语言数据发现有用的外部洞察,并可进一步用于下游NLP任务。特别是,能突出用户意图与情感的摘要,对于这些组织获取外部视角非常有用。这种外部视角能帮助组织更好地管理其产品、优惠、促销活动等。然而,某些挑战(如缺乏标注的领域特定数据集)阻碍了这些任务在金融科技领域的深入探索。为应对这些挑战,我们设计了一种从社交媒体数据中生成无监督短语摘要的方法,该方法使用"行动-对象"对(意图短语)。我们通过不同摘要中可用的各种Reddit讨论帖子的上下文信息,将所提方法与其他基于关键短语的摘要生成方法进行了评估。我们引入了一些"上下文指标",如唯一词数量、"行动-对象"对和名词块,以评估这些基于短语的摘要从源文本中检索到的上下文信息。我们证明,我们的方法在这些指标上显著优于基线方法,从而提供了其有效性的定性和定量度量。所提框架已作为美国运通内部托管的Web实用门户得到应用。