Stylistic analysis of text is a key task in research areas ranging from authorship attribution to forensic analysis and personality profiling. The existing approaches for stylistic analysis are plagued by issues like topic influence, lack of discriminability for large number of authors and the requirement for large amounts of diverse data. In this paper, the source of these issues are identified along with the necessity for a cognitive perspective on authorial style in addressing them. A novel feature representation, called Trajectory-based Style Estimation (TraSE), is introduced to support this purpose. Authorship attribution experiments with over 27,000 authors and 1.4 million samples in a cross-domain scenario resulted in 90% attribution accuracy suggesting that the feature representation is immune to such negative influences and an excellent candidate for stylistic analysis. Finally, a qualitative analysis is performed on TraSE using physical human characteristics, like age, to validate its claim on capturing cognitive traits.
翻译:文本风格分析是作者归属、取证分析及人格特征刻画等研究领域的关键任务。现有风格分析方法面临主题干扰、大规模作者区分能力不足以及对多样化数据需求量大等问题。本文识别了这些问题的根源,并论证了从认知视角理解作者风格对解决这些问题的重要性。为此,提出了一种新型特征表示方法——基于轨迹的风格估计(TraSE)。在跨域场景下基于超过2.7万名作者及140万样本的作者归属实验中,该方法实现了90%的归属准确率,表明该特征表示能有效规避上述负面影响,是风格分析的优异候选方案。最后,利用年龄等人类生理特征对TraSE进行定性分析,验证了其捕捉认知特质的宣称。