Language models have been increasingly popular for automatic creativity assessment, generating semantic distances to objectively measure the quality of creative ideas. However, there is currently a lack of an automatic assessment system for evaluating creative ideas in the Chinese language. To address this gap, we developed TransDis, a scoring system using transformer-based language models, capable of providing valid originality (quality) and flexibility (variety) scores for Alternative Uses Task (AUT) responses in Chinese. Study 1 demonstrated that the latent model-rated originality factor, comprised of three transformer-based models, strongly predicted human originality ratings, and the model-rated flexibility strongly correlated with human flexibility ratings as well. Criterion validity analyses indicated that model-rated originality and flexibility positively correlated to other creativity measures, demonstrating similar validity to human ratings. Study 2 & 3 showed that TransDis effectively distinguished participants instructed to provide creative vs. common uses (Study 2) and participants instructed to generate ideas in a flexible vs. persistent way (Study 3). Our findings suggest that TransDis can be a reliable and low-cost tool for measuring idea originality and flexibility in Chinese language, potentially paving the way for automatic creativity assessment in other languages. We offer an open platform to compute originality and flexibility for AUT responses in Chinese and over 50 other languages (https://osf.io/59jv2/).
翻译:语言模型在创造力自动评估中日益流行,通过生成语义距离来客观衡量创意质量。然而,目前缺乏针对汉语创意想法的自动评估系统。为弥补这一空白,我们开发了TransDis——一种基于Transformer语言模型的评分系统,能够为替代用途任务(AUT)的中文回答提供有效的原创性(质量)和灵活性(多样性)评分。研究1表明,由三种Transformer模型构成的潜在模型评分原创性因子能有效预测人类原创性评分,且模型评分的灵活性与人类灵活性评分高度相关。效标效度分析显示,模型评分的原创性和灵活性与其他创造力测量指标呈正相关,其效度与人类评分相当。研究2和3表明,TransDis能有效区分被要求提供创造性用途与被要求提供普通用途的参与者(研究2),以及被要求以灵活方式与被要求以坚持方式生成想法的参与者(研究3)。我们的研究结果表明,TransDis可作为测量中文创意原创性与灵活性的可靠低成本工具,有望为其他语言的创造力自动评估铺平道路。我们提供开放平台,可计算中文及50余种其他语言AUT回答的原创性与灵活性得分(https://osf.io/59jv2/)。