Objective: This study examines how well leading Chinese and Western large language models understand and apply Chinese social work principles, focusing on their foundational knowledge within a non-Western professional setting. We test whether the cultural context in the developing country influences model reasoning and accuracy. Method: Using a published self-study version of the Chinese National Social Work Examination (160 questions) covering jurisprudence and applied knowledge, we administered three testing conditions to eight cloud-based large language models - four Chinese and four Western. We examined their responses following official guidelines and evaluated their explanations' reasoning quality. Results: Seven models exceeded the 60-point passing threshold in both sections. Chinese models performed better in jurisprudence (median = 77.0 vs. 70.3) but slightly lower in applied knowledge (median = 65.5 vs. 67.0). Both groups showed cultural biases, particularly regarding gender equality and family dynamics. Models demonstrated strong professional terminology knowledge but struggled with culturally specific interventions. Valid reasoning in incorrect answers ranged from 16.4% to 45.0%. Conclusions: While both Chinese and Western models show foundational knowledge of Chinese social work principles, technical language proficiency does not ensure cultural competence. Chinese models demonstrate advantages in regulatory content, yet both Chinese and Western models struggle with culturally nuanced practice scenarios. These findings contribute to informing responsible AI integration into cross-cultural social work practice.
翻译:目的:本研究考察领先的中西方大语言模型对中国社会工作原则的理解与应用能力,重点关注其在非西方专业情境下的基础知识掌握。我们检验发展中国家的文化语境是否影响模型的推理与准确性。方法:采用已出版的中国国家社会工作者职业水平考试自学版试题(160道题),涵盖法规政策与实务知识两个部分,对八个云端大语言模型(四个中文模型与四个西方模型)进行三种测试条件下的评估。我们依据官方评分标准分析其作答表现,并评估其解释说明的推理质量。结果:七个模型在两个部分的得分均超过60分合格线。中文模型在法规政策部分表现更优(中位数=77.0分 vs. 70.3分),但在实务知识部分略低于西方模型(中位数=65.5分 vs. 67.0分)。两组模型均表现出文化偏见,尤其在性别平等与家庭动态相关议题上。模型展现出扎实的专业术语知识,但在文化特异性干预措施方面存在困难。错误答案中包含有效推理的比例介于16.4%至45.0%之间。结论:虽然中西方模型均展现出对中国社会工作原则的基础认知,但技术语言能力并不能确保文化胜任力。中文模型在法规内容方面具有优势,然而中西方模型在处理文化细微差别的实践场景时均面临挑战。这些发现为人工智能在跨文化社会工作实践中负责任地整合提供了参考依据。