In this work, we study how to make mmWave radar presence detection more interpretable for Ambient Assisted Living (AAL) settings, where camera-based sensing raises privacy concerns. We propose a Generative Latent Alignment (GLA) framework that combines a lightweight convolutional variational autoencoder with a frozen CLIP text encoder to learn a low-dimensional latent representation of radar Range-Angle (RA) heatmaps. The latent space is softly aligned with two semantic anchors corresponding to "empty room" and "person present", and Grad-CAM is applied in this aligned latent space to visualize which spatial regions support each presence decision. On our mmWave radar dataset, we qualitatively observe that the "person present" class produces compact Grad-CAM blobs that coincide with strong RA returns, whereas "empty room" samples yield diffuse or no evidence. We also conduct an ablation study using unrelated text prompts, which degrades both reconstruction and localization, suggesting that radar-specific anchors are important for meaningful explanations in this setting.
翻译:本研究旨在探讨如何使毫米波雷达存在检测在环境辅助生活(AAL)场景中更具可解释性,在该场景中基于摄像头的传感方式会引发隐私担忧。我们提出了一种生成式潜在对齐(GLA)框架,该框架将轻量级卷积变分自编码器与冻结的CLIP文本编码器相结合,以学习雷达距离-角度(RA)热图的低维潜在表示。该潜在空间通过软对齐方式与“空房间”和“有人存在”两个语义锚点相关联,并在此对齐的潜在空间中应用Grad-CAM方法,以可视化支持每个存在决策的空间区域。在我们的毫米波雷达数据集上,我们定性观察到“有人存在”类别会产生紧凑的Grad-CAM激活区域,这些区域与强烈的RA回波信号重合;而“空房间”样本则产生弥散或无证据的激活模式。我们还通过使用无关文本提示进行了消融实验,结果表明这会导致重建与定位性能的下降,说明雷达特定的语义锚点对于在此场景中获得有意义的解释至关重要。