Understanding how social situations unfold in people's daily lives is relevant to designing mobile systems that can support users in their personal goals, well-being, and activities. As an alternative to questionnaires, some studies have used passively collected smartphone sensor data to infer social context (i.e., being alone or not) with machine learning models. However, the few existing studies have focused on specific daily life occasions and limited geographic cohorts in one or two countries. This limits the understanding of how inference models work in terms of generalization to everyday life occasions and multiple countries. In this paper, we used a novel, large-scale, and multimodal smartphone sensing dataset with over 216K self-reports collected from 581 young adults in five countries (Mongolia, Italy, Denmark, UK, Paraguay), first to understand whether social context inference is feasible with sensor data, and then, to know how behavioral and country-level diversity affects inferences. We found that several sensors are informative of social context, that partially personalized multi-country models (trained and tested with data from all countries) and country-specific models (trained and tested within countries) can achieve similar performance above 90% AUC, and that models do not generalize well to unseen countries regardless of geographic proximity. These findings confirm the importance of the diversity of mobile data, to better understand social context inference models in different countries.
翻译:理解人们在日常生活中如何展开社交情境,对于设计能够支持用户实现个人目标、提升幸福感和日常活动的移动系统具有重要意义。作为问卷调查的替代方案,部分研究已利用被动收集的智能手机传感器数据,通过机器学习模型推断社会背景(即是否独处)。然而,现有少数研究仅聚焦于特定日常生活场景及局限于一两个国家的地理队列,这限制了对推断模型在日常生活场景及多国家情境下泛化能力的理解。本文采用一种新颖的大规模多模态智能手机感知数据集,包含来自五个国家(蒙古、意大利、丹麦、英国、巴拉圭)581名年轻人共计超过21.6万份自我报告数据,旨在首先验证基于传感器数据推断社会背景的可行性,进而探究行为多样性与国家层面差异如何影响推断结果。研究发现:多种传感器对社会背景具有信息价值;部分个性化的多国模型(使用所有国家数据训练与测试)与国家特定模型(在各国数据内训练与测试)均可达到90%以上AUC的相近性能;但模型对未见国家的泛化能力较差,且不受地理邻近性影响。这些发现证实了移动数据多样性对深入理解不同国家社会背景推断模型的重要性。