This paper introduces supervised machine learning to the literature measuring corporate culture from text documents. We compile a unique data set of employee reviews that were labeled by human evaluators with respect to the information the reviews reveal about the firms' corporate culture. Using this data set, we fine-tune state-of-the-art transformer-based language models to perform the same classification task. In out-of-sample predictions, our language models classify 16 to 28 percent points more of employee reviews in line with human evaluators than traditional approaches of text classification. We make our models publicly available.
翻译:本文将有监督机器学习引入从文本文献中测量企业文化的领域。我们构建了一个独特的数据集,其中包含经人工评估者标注的员工评论,这些标注揭示了企业文化的相关信息。利用该数据集,我们对最先进的基于Transformer的语言模型进行微调,以执行相同的分类任务。在样本外预测中,与传统文本分类方法相比,我们的语言模型在员工评论分类上与人工评估者的一致性提高了16至28个百分点。我们将这些模型公开提供。