Millimeter-wave (mmWave) radar enables privacy-preserving human activity recognition (HAR), yet real-world deployment remains hindered by costly annotation and poor transferability under domain shift. Although prior efforts partially alleviate these challenges, most still require retraining or adaptation for each new deployment setting. This keeps mmWave HAR in a repeated collect-tune-redeploy cycle, making scalable real-world deployment difficult. In this paper, we present RAGent, a deployment-time training-free framework for mmWave HAR that reformulates recognition as evidence-grounded inference over reusable radar knowledge rather than deployment-specific model optimization. Offline, RAGent constructs a reusable radar knowledge base through constrained cross-modal supervision, where a Vision-Language Model (VLM) transfers activity semantics from synchronized videos to paired radar segments without manual radar annotation. At deployment time, RAGent recognizes activities from radar alone by retrieving physically comparable precedents in an explicit kinematic space and resolving the final label through structured multi-role reasoning. The reasoning protocol is further refined offline through zero-gradient self-evolution. Extensive experiments on a self-collected dataset show that RAGent achieves 93.39% accuracy without per-domain retraining or target-domain adaptation, while generalizing robustly across domains.
翻译:毫米波雷达能够实现保护隐私的人体活动识别(HAR),然而实际部署仍因高昂的标注成本和域偏移下的低迁移性而受阻。尽管先前的研究努力部分缓解了这些挑战,但大多数方法仍需要针对每个新部署场景进行重新训练或自适应调整。这使得毫米波HAR陷入反复的"采集-调优-重新部署"循环,难以实现可扩展的实际部署。本文提出RAGent——一种面向毫米波HAR的部署时无需训练的框架,将识别问题重新定义为基于可重用雷达知识的证据驱动推理,而非针对特定部署的模型优化。离线阶段,RAGent通过约束跨模态监督构建可重用雷达知识库:视觉语言模型(VLM)将同步视频中的活动语义迁移至配对雷达片段,无需人工雷达标注。部署阶段,RAGent通过检索显式运动学空间中的物理可比较先例,并经由结构化多角色推理得出最终标签,仅凭雷达数据即可识别活动。该推理协议进一步通过零梯度自我进化在离线阶段进行优化。在自采数据集上的大量实验表明,RAGent在无需域内重新训练或目标域自适应的情况下达到93.39%的准确率,同时实现跨域的鲁棒泛化。