Integrating inertial measurement units (IMUs) with large language models (LLMs) advances multimodal AI by enhancing human activity understanding. We introduce SensorCaps, a dataset of 26,288 IMU-derived activity narrations, and OpenSQA, an instruction-following dataset with 257,562 question-answer pairs. Combining LIMU-BERT and Llama, we develop LLaSA, a Large Multimodal Agent capable of interpreting and responding to activity and motion analysis queries. Our evaluation demonstrates LLaSA's effectiveness in activity classification and question answering, highlighting its potential in healthcare, sports science, and human-computer interaction. These contributions advance sensor-aware language models and open new research avenues. Our code repository and datasets can be found on https://github.com/BASHLab/LLaSA.
翻译:通过将惯性测量单元(IMU)与大型语言模型(LLM)相结合,可增强对人类活动的理解,从而推动多模态人工智能的发展。我们提出了SensorCaps数据集(包含26,288条基于IMU的活动叙述)和OpenSQA数据集(包含257,562个遵循指令的问答对)。结合LIMU-BERT与Llama模型,我们开发了LLaSA——一个能够解释并响应活动与运动分析查询的大型多模态智能体。评估结果表明,LLaSA在活动分类和问答任务中表现优异,展现了其在医疗健康、运动科学和人机交互领域的应用潜力。这些工作推动了传感器感知语言模型的发展,并开辟了新的研究方向。我们的代码库与数据集可在 https://github.com/BASHLab/LLaSA 获取。