This paper introduces transformer-based language models to the literature measuring corporate culture from text documents. We compile a unique data set of employee reviews that were labeled by human evaluators with respect to the information the reviews reveal about the firms' corporate culture. Using this data set, we fine-tune state-of-the-art transformer-based language models to perform the same classification task. In out-of-sample predictions, our language models classify 17 to 30 percentage points more of employee reviews in line with human evaluators than traditional approaches of text classification. We make our models publicly available.
翻译:摘要:本文首次将基于Transformer的语言模型引入通过文本文档度量企业文化的研究领域。我们构建了一个独特的员工评价数据集,该数据集由人工评估者根据评价所揭示的企业文化信息进行标注。利用该数据集,我们对最先进的基于Transformer的语言模型进行微调,使其执行相同的分类任务。在样本外预测中,相较于传统的文本分类方法,我们的语言模型在17%至30%的员工评价分类中与人工评估者的判断更为一致。我们将公开提供这些模型。