Modern speaker recognition systems represent utterances by embedding vectors. Conventional embedding vectors are dense and non-structural. In this paper, we propose an ordered binary embedding approach that sorts the dimensions of the embedding vector via a nested dropout and converts the sorted vectors to binary codes via Bernoulli sampling. The resultant ordered binary codes offer some important merits such as hierarchical clustering, reduced memory usage, and fast retrieval. These merits were empirically verified by comprehensive experiments on a speaker identification task with the VoxCeleb and CN-Celeb datasets.
翻译:现代说话人识别系统通过嵌入向量表示语音片段。传统的嵌入向量具有密集且非结构化的特点。本文提出一种有序二元嵌入方法,通过嵌套 dropout 对嵌入向量的维度进行排序,并利用伯努利采样将排序后的向量转换为二元编码。由此得到的有序二元编码具有重要优势,例如支持层次聚类、降低内存使用和实现快速检索。这些优势通过在 VoxCeleb 和 CN-Celeb 数据集上进行的说话人识别任务综合实验得到了实证验证。