一种基于集成特征优化的新型深度混合框架用于鲁棒实时人类活动识别 (A Novel Deep Hybrid Framework with Ensemble-Based Feature Optimization for Robust Real-Time Human Activity Recognition)

Real-time Human Activity Recognition (HAR) has wide-ranging applications in areas such as context-aware environments, public safety, assistive technologies, and autonomous monitoring and surveillance systems. However, existing real-time HAR systems face significant challenges, including limited scalability and high computational costs arising from redundant features. To address these issues, the Inception-V3 model was customized with region-based and boundary-aware operations, using average pooling and max pooling, respectively, to enhance region homogeneity, suppress noise, and capture discriminative local features, while improving robustness through down-sampling. Furthermore, to effectively encode motion dynamics, an Attention-Augmented Long Short-Term Memory (AA-LSTM) network was employed to learn temporal dependencies across video frames. Features are extracted from video dataset and are then optimized through a novel proposed dynamic composite feature selection method called Adaptive Dynamic Fitness Sharing and Attention (ADFSA). This ADFSA mechanism is embedded within a genetic algorithm to select a compact, optimized subset of features by dynamically balancing multiple objectives, accuracy, redundancy reduction, feature uniqueness, and complexity minimization. As a result, the selected subset of diverse and discriminative features enables lightweight machine learning classifiers to achieve accurate and robust HAR in heterogeneous environments. Experimental results demonstrate up to 99.65\% accuracy using as few as seven selected features, with improved inference time on the challenging UCF-YouTube dataset, which includes factors such as occlusion, cluttered backgrounds, complex motion dynamics, and poor illumination conditions.

翻译：实时人类活动识别在情境感知环境、公共安全、辅助技术以及自主监控与安防系统等领域具有广泛应用。然而，现有实时人类活动识别系统面临显著挑战，包括由冗余特征导致的可扩展性受限和高计算成本。为解决这些问题，本研究对Inception-V3模型进行了定制化改进，分别采用基于区域和边界感知的操作，通过平均池化和最大池化来增强区域同质性、抑制噪声并捕获判别性局部特征，同时通过下采样提升鲁棒性。此外，为有效编码运动动态，采用注意力增强长短期记忆网络来学习视频帧间的时间依赖性。从视频数据集中提取特征后，通过一种新型动态复合特征选择方法——自适应动态适应度共享与注意力机制进行优化。该机制嵌入遗传算法中，通过动态平衡多项目标（准确率、冗余度降低、特征独特性及复杂度最小化）来选择紧凑优化的特征子集。最终，所选多样化判别性特征子集使得轻量级机器学习分类器能够在异构环境中实现准确鲁棒的人类活动识别。实验结果表明，在包含遮挡、杂乱背景、复杂运动动态和光照条件恶劣的挑战性UCF-YouTube数据集上，仅使用七个选定特征即可实现高达99.65%的准确率，并提升了推理速度。