The availability of quantitative text analysis methods has provided new ways of analyzing literature in a manner that was not available in the pre-information era. Here we apply comprehensive machine learning analysis to the work of William Shakespeare. The analysis shows clear changes in the style of writing over time, with the most significant changes in the sentence length, frequency of adjectives and adverbs, and the sentiments expressed in the text. Applying machine learning to make a stylometric prediction of the year of the play shows a Pearson correlation of 0.71 between the actual and predicted year, indicating that Shakespeare's writing style as reflected by the quantitative measurements changed over time. Additionally, it shows that the stylometrics of some of the plays is more similar to plays written either before or after the year they were written. For instance, Romeo and Juliet is dated 1596, but is more similar in stylometrics to plays written by Shakespeare after 1600. The source code for the analysis is available for free download.
翻译:定量文本分析方法的普及为文学研究提供了前信息化时代所不具备的新视角。本文对威廉·莎士比亚的作品进行了全面的机器学习分析。分析显示,其写作风格随时间发生了显著变化,主要体现在句子长度、形容词与副词的使用频率以及文本所表达的情感倾向方面。通过应用机器学习对戏剧创作年份进行文体测量预测,实际年份与预测年份之间的皮尔逊相关系数达到0.71,这表明莎士比亚的写作风格(由定量测量指标所反映)随时间推移而演变。此外,研究还发现部分戏剧的文体特征与创作年份前后其他作品更为相似。例如,《罗密欧与朱丽叶》的创作年份标注为1596年,但其文体测量特征更接近莎士比亚1600年之后的作品。本分析所用的源代码已提供免费下载。