Purpose: Reasoning language models (RLMs) have demonstrated significant advances in solving complex reasoning tasks. We examined their potential to assess parental cooperation during CPS interventions using case reports, a case factor characterized by ambiguous and conflicting information. Methods: A four stage workflow comprising (1) case reports collection, (2) reasoning-based assessment of parental cooperation, (3) automated category extraction, and (4) case labeling was developed. The performance of RLMs with different parameter sizes (255B, 32B, 4B) was compared against human validated data. Two expert human reviewers (EHRs) independently classified a weighted random sample of reports. Results: The largest RLM achieved the highest accuracy (89%), outperforming the initial approach (80%). Classification accuracy was higher for mothers (93%) than for fathers (85%), and EHRs exhibited similar differences. Conclusions: RLMs' reasoning can effectively assess complex case factors such as parental cooperation. Lower accuracy in assessing fathers' cooperation supports the argument of a stronger professional focus on mothers in CPS interventions.
翻译:目的:推理语言模型(RLMs)在解决复杂推理任务方面已展现出显著进展。本研究探讨了其利用案例报告评估儿童保护服务(CPS)干预期间父母合作的潜力,该案例因素以信息模糊且相互冲突为特征。方法:开发了一个包含(1)案例报告收集、(2)基于推理的父母合作评估、(3)自动类别提取和(4)案例标注的四阶段工作流程。将不同参数量(255B、32B、4B)的RLMs性能与人工验证数据进行比较。两位专家评审员(EHRs)独立对加权随机抽样的报告进行分类。结果:最大规模的RLM取得了最高准确率(89%),优于初始方法(80%)。对母亲的分类准确率(93%)高于对父亲的分类准确率(85%),专家评审员也表现出类似差异。结论:RLMs的推理能力可有效评估父母合作等复杂案例因素。评估父亲合作时较低的准确率支持了CPS干预中专业关注更侧重于母亲的观点。