Smart-home sensor data holds significant potential for several applications, including healthcare monitoring and assistive technologies. Existing approaches, however, face critical limitations. Supervised models require impractical amounts of labeled data. Foundation models for activity recognition focus only on inertial sensors, failing to address the unique characteristics of smart-home binary sensor events: their sparse, discrete nature combined with rich semantic associations. LLM-based approaches, while tested in this domain, still raise several issues regarding the need for natural language descriptions or prompting, and reliance on either external services or expensive hardware, making them infeasible in real-life scenarios due to privacy and cost concerns. We introduce DomusFM, the first foundation model specifically designed and pretrained for smart-home sensor data. DomusFM employs a self-supervised dual contrastive learning paradigm to capture both token-level semantic attributes and sequence-level temporal dependencies. By integrating semantic embeddings from a lightweight language model and specialized encoders for temporal patterns and binary states, DomusFM learns generalizable representations that transfer across environments and tasks related to activity and event analysis. Through leave-one-dataset-out evaluation across seven public smart-home datasets, we demonstrate that DomusFM outperforms state-of-the-art baselines on different downstream tasks, achieving superior performance even with only 5% of labeled training data available for fine-tuning. Our approach addresses data scarcity while maintaining practical deployability for real-world smart-home systems.
翻译:智能家居传感器数据在健康监测与辅助技术等多个应用领域具有重要潜力。然而,现有方法存在显著局限性:监督模型需要大量标注数据,这在实践中难以实现;活动识别领域的基础模型仅关注惯性传感器,未能处理智能家居二元传感器事件特有的稀疏离散特性及其丰富的语义关联;基于大语言模型的方法虽在该领域有所尝试,但仍存在若干问题,包括对自然语言描述或提示的需求、对外部服务或昂贵硬件的依赖,这些因素因隐私与成本考量在实际场景中往往不可行。本文提出DomusFM,这是首个专为智能家居传感器数据设计与预训练的基础模型。DomusFM采用自监督双对比学习范式,以同时捕获令牌级语义属性与序列级时序依赖。通过集成轻量级语言模型的语义嵌入、以及针对时序模式与二元状态的专用编码器,DomusFM能够学习可迁移的通用表征,适用于跨环境与跨任务的活动及事件分析。在七个公开智能家居数据集上进行的留一数据集评估表明,DomusFM在不同下游任务中均优于现有先进基线方法,即使在仅使用5%标注数据进行微调的情况下仍能取得优异性能。本方法在解决数据稀缺问题的同时,保持了现实智能家居系统实际部署的可行性。