This study presents a novel robot-led approach to assessing children's mental wellbeing using a Vision Language Model (VLM). Inspired by the Child Apperception Test (CAT), the social robot NAO presented children with pictorial stimuli to elicit their verbal narratives of the images, which were then evaluated by a VLM in accordance with CAT assessment guidelines. The VLM's assessments were systematically compared to those provided by a trained psychologist. The results reveal that while the VLM demonstrates moderate reliability in identifying cases with no wellbeing concerns, its ability to accurately classify assessments with clinical concern remains limited. Moreover, although the model's performance was generally consistent when prompted with varying demographic factors such as age and gender, a significantly higher false positive rate was observed for girls, indicating potential sensitivity to gender attribute. These findings highlight both the promise and the challenges of integrating VLMs into robot-led assessments of children's wellbeing.
翻译:本研究提出了一种新颖的机器人引导方法,利用视觉语言模型评估儿童的心理健康状况。受儿童统觉测验的启发,社交机器人NAO向儿童呈现图片刺激,引导其描述对图像的叙事,随后由视觉语言模型依据CAT评估准则对这些叙述进行分析。研究系统比较了视觉语言模型与训练有素的心理学家所提供的评估结果。结果表明,虽然视觉语言模型在识别无心理健康问题的案例中表现出中等程度的可靠性,但其准确分类具有临床关注度评估的能力仍然有限。此外,尽管该模型在不同人口统计学因素(如年龄和性别)提示下的表现总体一致,但在针对女孩的评估中观察到显著较高的误报率,表明模型可能对性别属性存在敏感性。这些发现凸显了将视觉语言模型整合到机器人引导的儿童健康评估中的潜力与挑战。