Dilated Convolution with Learnable Spacings

This thesis presents and evaluates the Dilated Convolution with Learnable Spacings (DCLS) method. Through various supervised learning experiments in the fields of computer vision, audio, and speech processing, the DCLS method proves to outperform both standard and advanced convolution techniques. The research is organized into several steps, starting with an analysis of the literature and existing convolution techniques that preceded the development of the DCLS method. We were particularly interested in the methods that are closely related to our own and that remain essential to capture the nuances and uniqueness of our approach. The cornerstone of our study is the introduction and application of the DCLS method to convolutional neural networks (CNNs), as well as to hybrid architectures that rely on both convolutional and visual attention approaches. DCLS is shown to be particularly effective in tasks such as classification, semantic segmentation, and object detection. Initially using bilinear interpolation, the study also explores other interpolation methods, finding that Gaussian interpolation slightly improves performance. The DCLS method is further applied to spiking neural networks (SNNs) to enable synaptic delay learning within a neural network that could eventually be transferred to so-called neuromorphic chips. The results show that the DCLS method stands out as a new state-of-the-art technique in SNN audio classification for certain benchmark tasks in this field. These tasks involve datasets with a high temporal component. In addition, we show that DCLS can significantly improve the accuracy of artificial neural networks for the multi-label audio classification task. We conclude with a discussion of the chosen experimental setup, its limitations, the limitations of our method, and our results.

翻译：本文提出并评估了具有可学习间距的膨胀卷积（DCLS）方法。通过在计算机视觉、音频和语音处理领域的多项监督学习实验，DCLS方法被证明优于标准及先进的卷积技术。本研究分为若干步骤，首先分析了DCLS方法发展之前的文献和现有卷积技术。我们特别关注了与自身方法密切相关且对捕捉我们方法的细微差别和独特性至关重要的技术。本研究的核心是将DCLS方法引入并应用于卷积神经网络（CNNs），以及依赖于卷积和视觉注意力方法的混合架构。DCLS在分类、语义分割和目标检测等任务中表现出显著的有效性。研究最初采用双线性插值，随后探索了其他插值方法，发现高斯插值能略微提升性能。DCLS方法进一步被应用于脉冲神经网络（SNNs），以实现神经网络内的突触延迟学习，该技术最终可迁移至所谓的神经形态芯片。结果表明，在该领域某些基准任务中，DCLS方法在SNN音频分类方面成为新的最先进技术。这些任务涉及具有高时间成分的数据集。此外，我们证明DCLS能显著提升人工神经网络在多标签音频分类任务中的准确性。最后，我们讨论了所选的实验设置、其局限性、我们方法的局限性以及实验结果。