Implicit Neural Representations (INRs) have emerged as a promising method for representing diverse data modalities, including 3D shapes, images, and audio. While recent research has demonstrated successful applications of INRs in image and 3D shape compression, their potential for audio compression remains largely unexplored. Motivated by this, we present a preliminary investigation into the use of INRs for audio compression. Our study introduces Siamese SIREN, a novel approach based on the popular SIREN architecture. Our experimental results indicate that Siamese SIREN achieves superior audio reconstruction fidelity while utilizing fewer network parameters compared to previous INR architectures.
翻译:隐式神经表示(INRs)已成为一种有前景的方法,用于表示包括三维形状、图像和音频在内的多种数据模态。尽管近期研究已成功将INRs应用于图像和三维形状压缩,但其在音频压缩方面的潜力仍未被充分探索。受此启发,我们对INRs在音频压缩中的应用进行了初步研究。本研究引入Siamese SIREN——一种基于流行的SIREN架构的新型方法。实验结果表明,与以往的INR架构相比,Siamese SIREN在使用更少网络参数的同时,实现了更优的音频重建保真度。