In this paper, we apply the variational information bottleneck approach to end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This allows us to investigate what information is essential for the model. EEND-EDA utilizes vector representations of the speakers in a conversation - attractors. Our analysis shows that, attractors do not necessarily have to contain speaker characteristic information. On the other hand, giving the attractors more freedom allowing them to encode some extra (possibly speaker-specific) information leads to small but consistent diarization performance improvements. Despite architectural differences in EEND systems, the notion of attractors and frame embeddings is common to most of them and not specific to EEND-EDA. We believe that the main conclusions of this work can apply to other variants of EEND. Thus, we hope this paper will be a valuable contribution to guide the community to make more informed decisions when designing new systems.
翻译:本文应用变分信息瓶颈方法于采用编码器-解码器吸引子的端到端神经语者日志(EEND-EDA),从而探究模型需要的关键信息。EEND-EDA利用对话中说话人的向量表示——吸引子。我们的分析表明,吸引子不一定需要包含说话人特征信息。另一方面,赋予吸引子更多自由度以允许其编码额外(可能是说话人特定)信息,能带来虽微小但一致的语者日志性能提升。尽管EEND系统架构存在差异,但吸引子与帧嵌入的概念在大多数变体中普遍存在,并非EEND-EDA所独有。我们认为本工作的主要结论可适用于其他EEND变体。因此,本文有望为业界在设计新系统时提供更具依据的决策参考。