We propose a method for sound source localization (SSL) for a source inside a structure using Ac-CycleGAN under unpaired data conditions. The proposed method utilizes a large amount of simulated data and a small amount of actual experimental data to locate a sound source inside a structure in a real environment. An Ac-CycleGAN generator contributes to the transformation of simulated data into real data, or vice versa, using unpaired data from both domains. The discriminator of an Ac-CycleGAN model is designed to differentiate between the transformed data generated by the generator and real data, while also predicting the location of the sound source. Vectors representing the frequency spectrum of the accelerometers (FSAs) measured at three points outside the structure are used as input data and the source areas inside the structure are used as labels. The input data vectors are concatenated vertically to form an image. Labels are defined by dividing the interior of the structure into eight areas with one-hot encoding for each area. Thus, the SSL problem is redefined as an image-classification problem to stochastically estimate the location of the sound source. We show that it is possible to estimate the sound source location using the Ac-CycleGAN discriminator for unpaired data across domains. Furthermore, we analyze the discriminative factors for distinguishing the data. The proposed model exhibited an accuracy exceeding 90\% when trained on 80\% of actual data (12.5\% of simulated data). Despite potential imperfections in the domain transformation process carried out by the Ac-CycleGAN generator, the discriminator can effectively distinguish between transferred and real data by selectively utilizing only those features that generate a relatively small transformation error.
翻译:本文提出一种在非配对数据条件下,利用Ac-CycleGAN对结构内部声源进行定位的方法。该方法通过大量仿真数据与少量实际实验数据,实现对真实环境中结构内部声源的位置估计。Ac-CycleGAN生成器借助两个域中的非配对数据,实现仿真数据与真实数据之间的双向转换。其判别器被设计为不仅能区分生成器产生的转换数据与真实数据,还可同步预测声源位置。研究采用结构外部三处加速度计测得的频谱向量作为输入数据,并以结构内部声源区域作为标签。输入向量经纵向拼接形成图像,标签则通过将结构内部划分为八个区域并进行独热编码定义。因此,声源定位问题被重构为图像分类问题,通过概率估计确定声源位置。实验表明,利用Ac-CycleGAN判别器处理跨域非配对数据可有效估计声源位置。进一步地,我们分析了数据区分的关键判别因素。当使用80%实际数据(占仿真数据的12.5%)进行训练时,该模型定位准确率超过90%。尽管Ac-CycleGAN生成器的域转换过程可能存在缺陷,但判别器可通过选择性利用转换误差较小的特征,有效区分转换数据与真实数据。