This paper proposes a methodology for discovering meaningful properties in data by exploring the latent space of unsupervised deep generative models. We combine manipulation of individual latent variables to extreme values with methods inspired by causal inference into an approach we call causal disentanglement with extreme values (CDEV) and show that this method yields insights for model interpretability. With this, we can test for what properties of unknown data the model encodes as meaningful, using it to glean insight into the communication system of sperm whales (Physeter macrocephalus), one of the most intriguing and understudied animal communication systems. The network architecture used has been shown to learn meaningful representations of speech; here, it is used as a learning mechanism to decipher the properties of another vocal communication system in which case we have no ground truth. The proposed methodology suggests that sperm whales encode information using the number of clicks in a sequence, the regularity of their timing, and audio properties such as the spectral mean and the acoustic regularity of the sequences. Some of these findings are consistent with existing hypotheses, while others are proposed for the first time. We also argue that our models uncover rules that govern the structure of units in the communication system and apply them while generating innovative data not shown during training. This paper suggests that an interpretation of the outputs of deep neural networks with causal inference methodology can be a viable strategy for approaching data about which little is known and presents another case of how deep learning can limit the hypothesis space. Finally, the proposed approach can be extended to other architectures and datasets.
翻译:本文提出一种方法,通过探索无监督深度生成模型的潜在空间来发现数据中的有意义的属性。我们将单个潜在变量操纵至极端值与受因果推断启发的方法相结合,提出了一种称为“极端值因果解缠”(CDEV)的方法,并表明该方法可为模型可解释性提供洞见。通过这种方法,我们能够检验模型编码了哪些未知数据的属性作为有意义的特征,并据此深入理解抹香鲸(Physeter macrocephalus)这一最迷人且研究最少的动物通信系统之一。所使用的网络架构已被证明能够学习语音的有意义表征;在此,它被用作一种学习机制来破译另一个发声通信系统的属性,而该案例中我们缺乏真实基准。所提出的方法表明,抹香鲸通过序列中咔哒声的数量、其时序规律性以及音频属性(如频谱均值和序列的声学规律性)来编码信息。其中一些发现与现有假设一致,而另一些则为首次提出。我们还论证,我们的模型揭示了控制通信系统中单元结构的规则,并在生成训练期间未见的新数据时应用了这些规则。本文表明,将深度神经网络的输出与因果推断方法论结合进行解释,是逼近知之甚少的数据的一种可行策略,并呈现了深度学习如何限制假设空间的又一案例。最后,所提出的方法可推广至其他架构和数据集。