Many multi-source localization and tracking models based on neural networks use one or several recurrent layers at their final stages to track the movement of the sources. Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vector to store their state. However, this approach results in the information from all the sources being contained in a single ordered vector, which is not optimal for permutation-invariant problems such as multi-source tracking. In this paper, we present a new recurrent architecture that uses unordered sets to represent both its input and its state and that is invariant to the permutations of the input set and equivariant to the permutations of the state set. Hence, the information of every sound source is represented in an individual embedding and the new estimates are assigned to the tracked trajectories regardless of their order.
翻译:许多基于神经网络的多源定位与跟踪模型在其最后阶段采用一个或多个循环层来跟踪声源的运动。传统循环神经网络,如长短期记忆网络或门控循环单元,以向量作为输入并用另一向量存储状态。然而,这种方法导致所有声源的信息被包含在单一有序向量中,这对于多源跟踪等置换不变问题并非最优。本文提出一种新的循环架构,该架构使用无序集合来表示其输入及状态,对输入集合的置换具有不变性,且对状态集合的置换具有等变性。因此,每个声源的信息由独立嵌入表示,新估计值将按顺序分配到被跟踪的轨迹上。