In social media networks, users produce a large amount of text content anytime, providing researchers with a valuable approach to digging for personality-related information. Personality detection based on user-generated texts is a universal method that can be used to build user portraits. The presence of noise in social media texts hinders personality detection. However, previous studies have not fully addressed this challenge. Inspired by the scanning reading technique, we propose an attention-based information extraction mechanism (AIEM) for long texts, which is applied to quickly locate valuable pieces of information, and focus more attention on the deep semantics of key pieces. Then, we provide a novel attention-based denoising framework (ADF) for personality detection tasks and achieve state-of-the-art performance on two commonly used datasets. Notably, we obtain an average accuracy improvement of 10.2% on the gold standard Twitter-Myers-Briggs Type Indicator (Twitter-MBTI) dataset. We made our code publicly available on GitHub. We shed light on how AIEM works to magnify personality-related signals.
翻译:在社交媒体网络中,用户随时产生大量文本内容,为研究者挖掘人格相关信息提供了宝贵途径。基于用户生成文本的人格检测是构建用户画像的通用方法。社交媒体文本中的噪声阻碍了人格检测性能,然而已有研究尚未充分解决该挑战。受扫描阅读技术启发,我们提出了一种面向长文本的基于注意力的信息提取机制(AIEM),用于快速定位有价值信息片段,并将更多注意力聚焦于关键片段的深层语义。进而我们针对人格检测任务提出了一种新颖的基于注意力的去噪框架(ADF),在两个常用数据集上取得了最优性能。值得注意的是,在黄金标准Twitter-Myers-Briggs类型指标(Twitter-MBTI)数据集上,我们获得了平均10.2%的准确率提升。我们已将代码公开发布于GitHub。研究揭示了AIEM如何放大与人格相关的信号。