The availability of quantitative methods that can analyze text has provided new ways of examining literature in a manner that was not available in the pre-information era. Here we apply comprehensive machine learning analysis to the work of William Shakespeare. The analysis shows clear change in style of writing over time, with the most significant changes in the sentence length, frequency of adjectives and adverbs, and the sentiments expressed in the text. Applying machine learning to make a stylometric prediction of the year of the play shows a Pearson correlation of 0.71 between the actual and predicted year, indicating that Shakespeare's writing style as reflected by the quantitative measurements changed over time. Additionally, it shows that the stylometrics of some of the plays is more similar to plays written either before or after the year they were written. For instance, Romeo and Juliet is dated 1596, but is more similar in stylometrics to plays written by Shakespeare after 1600. The source code for the analysis is available for free download.
翻译:定量文本分析方法的可及性,为以前信息时代所未有的文学研究新途径提供了可能。本研究对威廉·莎士比亚的作品进行了全面的机器学习分析。分析显示,其写作风格随时间呈现明显变化,最显著的改变体现在句子长度、形容词与副词的使用频率,以及文本所表达的情感倾向方面。应用机器学习对戏剧创作年份进行文体计量预测,实际年份与预测年份之间的皮尔逊相关系数达到0.71,这表明莎士比亚的写作风格(通过定量测量指标反映)确实随年代推移而变化。此外,研究显示部分戏剧的文体特征更接近其创作年份之前或之后的剧作。例如,《罗密欧与朱丽叶》创作于1596年,但其文体计量特征却更接近莎士比亚在1600年之后创作的戏剧。本分析使用的源代码可供免费下载。