Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by challenges in social communication, repetitive behavior, and sensory processing. One important research area in ASD is evaluating children's behavioral changes over time during treatment. The standard protocol with this objective is BOSCC, which involves dyadic interactions between a child and clinicians performing a pre-defined set of activities. A fundamental aspect of understanding children's behavior in these interactions is automatic speech understanding, particularly identifying who speaks and when. Conventional approaches in this area heavily rely on speech samples recorded from a spectator perspective, and there is limited research on egocentric speech modeling. In this study, we design an experiment to perform speech sampling in BOSCC interviews from an egocentric perspective using wearable sensors and explore pre-training Ego4D speech samples to enhance child-adult speaker classification in dyadic interactions. Our findings highlight the potential of egocentric speech collection and pre-training to improve speaker classification accuracy.
翻译:自闭症谱系障碍(ASD)是一种神经发育障碍,其特征表现为社交沟通困难、重复性行为及感觉处理异常。ASD研究的一个重要领域是评估儿童在治疗期间随时间变化的行为表现。针对此目标的标准评估协议为BOSCC,该协议涉及儿童与临床医生进行一系列预设活动的二元互动。理解儿童在这些互动中行为的一个基本方面是自动语音理解,特别是识别说话人身份及其说话时间点。该领域的传统方法严重依赖于从旁观者视角录制的语音样本,而针对自我中心语音建模的研究则相对有限。在本研究中,我们设计了一项实验,利用可穿戴传感器从自我中心视角对BOSCC访谈进行语音采样,并探索通过预训练Ego4D语音样本来增强二元互动中的儿童-成人说话人分类。我们的研究结果凸显了自我中心语音采集与预训练在提升说话人分类准确率方面的潜力。