人工智能与文化语境：大语言模型在中国社会工作专业标准中表现的实证研究 (AI and Cultural Context: An Empirical Investigation of Large Language Models' Performance on Chinese Social Work Professional Standards)

Objective: This study examines how well leading Chinese and Western large language models understand and apply Chinese social work principles, focusing on their foundational knowledge within a non-Western professional setting. We test whether the cultural context in the developing country influences model reasoning and accuracy. Method: Using a published self-study version of the Chinese National Social Work Examination (160 questions) covering jurisprudence and applied knowledge, we administered three testing conditions to eight cloud-based large language models - four Chinese and four Western. We examined their responses following official guidelines and evaluated their explanations' reasoning quality. Results: Seven models exceeded the 60-point passing threshold in both sections. Chinese models performed better in jurisprudence (median = 77.0 vs. 70.3) but slightly lower in applied knowledge (median = 65.5 vs. 67.0). Both groups showed cultural biases, particularly regarding gender equality and family dynamics. Models demonstrated strong professional terminology knowledge but struggled with culturally specific interventions. Valid reasoning in incorrect answers ranged from 16.4% to 45.0%. Conclusions: While both Chinese and Western models show foundational knowledge of Chinese social work principles, technical language proficiency does not ensure cultural competence. Chinese models demonstrate advantages in regulatory content, yet both Chinese and Western models struggle with culturally nuanced practice scenarios. These findings contribute to informing responsible AI integration into cross-cultural social work practice.

翻译：目的：本研究考察领先的中西方大语言模型对中国社会工作原则的理解与应用能力，重点关注其在非西方专业情境下的基础知识掌握。我们检验发展中国家的文化语境是否影响模型的推理与准确性。方法：采用已出版的中国国家社会工作者职业水平考试自学版试题（160道题），涵盖法规政策与实务知识两个部分，对八个云端大语言模型（四个中文模型与四个西方模型）进行三种测试条件下的评估。我们依据官方评分标准分析其作答表现，并评估其解释说明的推理质量。结果：七个模型在两个部分的得分均超过60分合格线。中文模型在法规政策部分表现更优（中位数=77.0分 vs. 70.3分），但在实务知识部分略低于西方模型（中位数=65.5分 vs. 67.0分）。两组模型均表现出文化偏见，尤其在性别平等与家庭动态相关议题上。模型展现出扎实的专业术语知识，但在文化特异性干预措施方面存在困难。错误答案中包含有效推理的比例介于16.4%至45.0%之间。结论：虽然中西方模型均展现出对中国社会工作原则的基础认知，但技术语言能力并不能确保文化胜任力。中文模型在法规内容方面具有优势，然而中西方模型在处理文化细微差别的实践场景时均面临挑战。这些发现为人工智能在跨文化社会工作实践中负责任地整合提供了参考依据。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日