In this paper, we apply the variational information bottleneck approach to end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This allows us to investigate what information is essential for the model. EEND-EDA utilizes attractors, vector representations of speakers in a conversation. Our analysis shows that, attractors do not necessarily have to contain speaker characteristic information. On the other hand, giving the attractors more freedom to allow them to encode some extra (possibly speaker-specific) information leads to small but consistent diarization performance improvements. Despite architectural differences in EEND systems, the notion of attractors and frame embeddings is common to most of them and not specific to EEND-EDA. We believe that the main conclusions of this work can apply to other variants of EEND. Thus, we hope this paper will be a valuable contribution to guide the community to make more informed decisions when designing new systems.
翻译:本文中,我们将变分信息瓶颈方法应用于基于编码器-解码器吸引子的端到端神经说话人日志化系统(EEND-EDA),以探究模型所需的关键信息。EEND-EDA利用吸引子——即对话中说话人的向量表示——进行建模。分析表明,吸引子并非必须包含说话人特征信息。另一方面,赋予吸引子更多自由度以编码额外(可能具有说话人特异性)的信息,可带来小幅但稳定的日志化性能提升。尽管EEND系统在架构上存在差异,但吸引子与帧嵌入的概念普遍存在于多数变体中,并非EEND-EDA所独有。我们认为本研究的主要结论可推广至其他EEND变体。因此,期望本文能为学界在设计新系统时提供有价值的参考,助力更明智的决策。