Ambivalence/Hesitancy Recognition in Videos for Personalized Digital Health Interventions

Manuela González-González,Soufiane Belharbi,Muhammad Osama Zeeshan,Masoumeh Sharafi,Muhammad Haseeb Aslam,Lorenzo Sia,Nicolas Richet,Marco Pedersoli,Alessandro Lameiras Koerich,Simon L Bacon,Eric Granger

from arxiv, 13 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2505.19328

Using behavioural science, health interventions focus on behaviour change by providing a framework to help patients acquire and maintain healthy habits that improve medical outcomes. In-person interventions are costly and difficult to scale, especially in resource-limited regions. Digital health interventions offer a cost-effective approach, potentially supporting independent living and self-management. Automating such interventions, especially through machine learning, has gained considerable attention recently. Ambivalence and hesitancy (A/H) play a primary role for individuals to delay, avoid, or abandon health interventions. A/H are subtle and conflicting emotions that place a person in a state between positive and negative evaluations of a behaviour, or between acceptance and refusal to engage in it. They manifest as affective inconsistency across modalities or within a modality, such as language, facial, vocal expressions, and body language. While experts can be trained to recognize A/H, integrating them into digital health interventions is costly and less effective. Automatic A/H recognition is therefore critical for the personalization and cost-effectiveness of digital health interventions. Here, we explore the application of deep learning models for A/H recognition in videos, a multi-modal task by nature. In particular, this paper covers three learning setups: supervised learning, unsupervised domain adaptation for personalization, and zero-shot inference via large language models (LLMs). Our experiments are conducted on the unique and recently published BAH video dataset for A/H recognition. Our results show limited performance, suggesting that more adapted multi-modal models are required for accurate A/H recognition. Better methods for modeling spatio-temporal and multimodal fusion are necessary to leverage conflicts within/across modalities.

翻译：基于行为科学，健康干预通过提供框架帮助患者建立并维持改善医疗结局的健康习惯，聚焦行为改变。面对面干预成本高昂且难以规模化，尤其在资源有限地区。数字健康干预提供了一种经济有效的方法，可能支持独立生活与自我管理。近年来，通过机器学习实现此类干预的自动化备受关注。矛盾与犹豫情绪在个体延迟、回避或放弃健康干预中起核心作用。矛盾/犹豫是一种微妙且冲突的情绪状态，使个体处于对行为的正面与负面评价之间，或参与意愿的接受与拒绝之间。这类情绪表现为跨模态或单模态（如语言、面部表情、声音表达及肢体语言）的情感不一致性。尽管专家可通过培训识别矛盾/犹豫，将其整合至数字健康干预成本高且效果有限。因此，自动识别矛盾/犹豫对实现数字健康干预的个性化与成本效益至关重要。本研究探索了深度学习模型在视频中识别矛盾/犹豫的应用——这本质上是一个多模态任务。具体而言，本文涵盖三种学习范式：监督学习、面向个性化的无监督域适应，以及通过大语言模型实现的零样本推理。实验基于近期发布的BAH矛盾/犹豫识别专用视频数据集进行。结果显示模型性能有限，表明准确识别矛盾/犹豫需要更适配的多模态模型。为利用模态内/跨模态的冲突信息，亟需更优的时空建模与多模态融合方法。

相关内容

健康

关注 27

健康是指一个人在身体、精神和社会等方面都处于良好的状态。健康包括两个方面的内容：

一是主要脏器无疾病，身体形态发育良好，体形均匀，人体各系统具有良好的生理功能，有较强的身体活动能力和劳动能力，这是对健康最基本的要求；

二是对疾病的抵抗能力较强，能够适应环境变化，各种生理刺激以及致病因素对身体的作用。传统的健康观是“无病即健康”，现代人的健康观是整体健康，世界卫生组织提出“健康不仅是躯体没有疾病，还要具备心理健康、社会适应良好和有道德”。因此，现代人的健康内容包括：躯体健康、心理健康、心灵健康、社会健康、智力健康、道德健康、环境健康等。健康是人的基本权利。健康是人生的第一财富。

利用表示学习推动多机构电子健康记录数据研究

专知会员服务

16+阅读 · 2025年2月17日

【干货书】面向医疗健康的数字孪生:设计、挑战和解决方案，380页pdf

专知会员服务

65+阅读 · 2023年1月10日

推荐！《医疗保健中强化学习的离策略评估》哈佛大学181页博士论文

专知会员服务

27+阅读 · 2022年7月21日

TPAMI 2022 | 最新综述：基于不同数据模态的行为识别

专知会员服务

53+阅读 · 2022年7月2日