Wearable Human Activity Recognition (WHAR) is a prominent research area within ubiquitous computing. Multi-sensor synchronous measurement has proven to be more effective for WHAR than using a single sensor. However, existing WHAR methods use shared convolutional kernels for indiscriminate temporal feature extraction across each sensor variable, which fails to effectively capture spatio-temporal relationships of intra-sensor and inter-sensor variables. We propose the DecomposeWHAR model consisting of a decomposition phase and a fusion phase to better model the relationships between modality variables. The decomposition creates high-dimensional representations of each intra-sensor variable through the improved Depth Separable Convolution to capture local temporal features while preserving their unique characteristics. The fusion phase begins by capturing relationships between intra-sensor variables and fusing their features at both the channel and variable levels. Long-range temporal dependencies are modeled using the State Space Model (SSM), and later cross-sensor interactions are dynamically captured through a self-attention mechanism, highlighting inter-sensor spatial correlations. Our model demonstrates superior performance on three widely used WHAR datasets, significantly outperforming state-of-the-art models while maintaining acceptable computational efficiency. Our codes and supplementary materials are available at https://github.com/Anakin2555/DecomposeWHAR.
翻译:可穿戴人体活动识别(WHAR)是普适计算领域的重要研究方向。多传感器同步测量已被证明比单传感器方法对WHAR更为有效。然而,现有WHAR方法使用共享卷积核对各传感器变量进行无差别时序特征提取,未能有效捕捉传感器内变量与传感器间变量的时空关联。我们提出DecomposeWHAR模型,包含分解与融合两个阶段,以更好地建模模态变量间的关系。分解阶段通过改进的深度可分离卷积为每个传感器内变量构建高维表征,在保留其独特特性的同时捕捉局部时序特征。融合阶段首先捕捉传感器内变量间的关系,并在通道与变量层级融合其特征。长程时序依赖通过状态空间模型(SSM)建模,随后通过自注意力机制动态捕获跨传感器交互,突出传感器间空间相关性。我们的模型在三个广泛使用的WHAR数据集上展现出优越性能,显著超越现有最优模型,同时保持可接受的计算效率。代码与补充材料详见https://github.com/Anakin2555/DecomposeWHAR。