Smart home device detection is a critical aspect of human-computer interaction. However, detecting targets in indoor environments can be challenging due to interference from ambient light and background noise. In this paper, we present a new model called FSA-YOLOv5, which addresses the limitations of traditional convolutional neural networks by introducing the Transformer to learn long-range dependencies. Additionally, we propose a new attention module, the full-separation attention module, which integrates spatial and channel dimensional information to learn contextual information. To improve tiny device detection, we include a prediction head for the indoor smart home device detection task. We also release the Southeast University Indoor Smart Speaker Dataset (SUSSD) to supplement existing data samples. Through a series of experiments on SUSSD, we demonstrate that our method outperforms other methods, highlighting the effectiveness of FSA-YOLOv5.
翻译:智能家居设备检测是人机交互的关键环节。然而,由于环境光线和背景噪声的干扰,在室内环境中检测目标具有挑战性。本文提出了一种名为FSA-YOLOv5的新模型,该模型通过引入Transformer学习长距离依赖关系,突破了传统卷积神经网络的局限性。此外,我们提出了一种新的注意力模块——全分离注意力模块,该模块整合了空间和通道维度的信息以学习上下文特征。为改善小型设备检测,我们针对室内智能家居设备检测任务新增了一个预测头。同时,我们发布了东南大学室内智能音箱数据集(SUSSD),以补充现有数据样本。通过在SUSSD上的一系列实验,我们证明该方法优于其他方法,凸显了FSA-YOLOv5的有效性。