Language models have been increasingly popular for automatic creativity assessment, generating semantic distances to objectively measure the quality of creative ideas. However, there is currently a lack of an automatic assessment system for evaluating creative ideas in the Chinese language. To address this gap, we developed TransDis, a scoring system using transformer-based language models, capable of providing valid originality (quality) and flexibility (variety) scores for Alternative Uses Task (AUT) responses in Chinese. Study 1 demonstrated that the latent model-rated originality factor, comprised of three transformer-based models, strongly predicted human originality ratings, and the model-rated flexibility strongly correlated with human flexibility ratings as well. Criterion validity analyses indicated that model-rated originality and flexibility positively correlated to other creativity measures, demonstrating similar validity to human ratings. Study 2 & 3 showed that TransDis effectively distinguished participants instructed to provide creative vs. common uses (Study 2) and participants instructed to generate ideas in a flexible vs. persistent way (Study 3). Our findings suggest that TransDis can be a reliable and low-cost tool for measuring idea originality and flexibility in Chinese language, potentially paving the way for automatic creativity assessment in other languages. We offer an open platform to compute originality and flexibility for AUT responses in Chinese and over 50 other languages (https://osf.io/59jv2/).
翻译:语言模型在创造力自动评估中日益流行,通过生成语义距离来客观衡量创意质量。然而,目前缺乏针对汉语创意想法进行自动评估的系统。为填补这一空白,我们开发了TransDis——一种基于Transformer语言模型的评分系统,能够为汉语替代用途任务(AUT)回答提供有效的原创性(质量)与灵活性(多样性)评分。研究1表明,由三种Transformer模型构成的潜在模型评分原创性因子能有效预测人工原创性评分,且模型评分的灵活性与人工灵活性评分高度相关。效标效度分析显示,模型评分的原创性与灵活性与其他创造力测量指标呈正相关,其效度与人工评分相当。研究2与3显示,TransDis能有效区分被要求提供创造性用途的参与者(研究2)与被要求灵活或持续生成创意的参与者(研究3)。我们的研究表明,TransDis可作为衡量汉语创意原创性与灵活性的可靠低成本工具,有望为其他语言的创造力自动评估奠定基础。我们提供了一个开放平台,用于计算汉语及50余种语言的AUT回答原创性与灵活性评分(https://osf.io/59jv2/)。