SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies

Advances in mobile and wearable technologies have enabled the potential to passively monitor a person's mental, behavioral, and affective health. These approaches typically rely on longitudinal collection of self-reported outcomes, e.g., depression, stress, and anxiety, to train machine learning (ML) models. However, the need to continuously self-report adds a significant burden on the participants, often resulting in attrition, missing labels, or insincere responses. In this work, we introduce the Scale Scores Simulation using Mental Models (SeSaMe) framework to alleviate participants' burden in digital mental health studies. By leveraging pre-trained large language models (LLMs), SeSaMe enables the simulation of participants' responses on psychological scales. In SeSaMe, researchers can prompt LLMs with information on participants' internal behavioral dispositions, enabling LLMs to construct mental models of participants to simulate their responses on psychological scales. We demonstrate an application of SeSaMe, where we use GPT-4 to simulate responses on one scale using responses from another as behavioral information. We also evaluate the alignment between human and SeSaMe-simulated responses to psychological scales. Then, we present experiments to inspect the utility of SeSaMe-simulated responses as ground truth in training ML models by replicating established depression and anxiety screening tasks from a previous study. Our results indicate SeSaMe to be a promising approach, but its alignment may vary across scales and specific prediction objectives. We also observed that model performance with simulated data was on par with using the real data for training in most evaluation scenarios. We conclude by discussing the potential implications of SeSaMe in addressing some challenges researchers face with ground-truth collection in passive sensing studies.

翻译：移动与可穿戴技术的进步使得被动监测个体的心理、行为及情感健康成为可能。这些方法通常依赖纵向收集自我报告的结果（如抑郁、压力、焦虑）来训练机器学习模型。然而，持续自我报告的需求给参与者带来了显著负担，常导致参与者流失、标签缺失或回应不真实。本研究提出了基于心理模型的量表分数模拟（SeSaMe）框架，旨在减轻数字心理健康研究中参与者的负担。通过利用预训练的大语言模型（LLMs），SeSaMe能够模拟参与者在心理量表上的回应。在SeSaMe中，研究者可向LLMs提供参与者内部行为倾向的信息，使其构建参与者的心理模型，进而模拟其在心理量表上的回应。我们展示了SeSaMe的一项应用：使用GPT-4以某一量表的回应作为行为信息，模拟另一个量表的回应。同时，我们评估了人类回应与SeSaMe模拟回应在心理量表上的对齐程度。接着，通过复现先前研究中已建立的抑郁与焦虑筛查任务，我们检验了SeSaMe模拟回应作为训练机器学习模型真实值的实用性。结果表明SeSaMe是一种有前景的方法，但其对齐程度可能因量表及具体预测目标而异。我们还发现，在大多数评估场景中，基于模拟数据训练的模型性能与基于真实数据训练的模型相当。最后，我们讨论了SeSaMe在被动感知研究中应对真实值收集所面临挑战方面的潜在意义。