We present VISTA (Visualization of Internal States and Their Associations), a novel pipeline for visually exploring and interpreting neural network representations. VISTA addresses the challenge of analyzing vast multidimensional spaces in modern machine learning models by mapping representations into a semantic 2D space. The resulting collages visually reveal patterns and relationships within internal representations. We demonstrate VISTA's utility by applying it to sparse autoencoder latents uncovering new properties and interpretations. We review the VISTA methodology, present findings from our case study ( https://got.drib.net/latents/ ), and discuss implications for neural network interpretability across various domains of machine learning.
翻译:我们提出VISTA(内部状态及其关联的可视化),一种用于视觉探索和解释神经网络表征的新型流程。VISTA通过将表征映射到语义二维空间中,解决了分析现代机器学习模型中庞大高维空间的挑战。生成的拼贴图直观揭示了内部表征中的模式与关联。我们通过将VISTA应用于稀疏自编码器潜在空间并发现新特性与解释,论证了其实用性。我们回顾VISTA方法学,展示案例研究(https://got.drib.net/latents/)中的发现,并讨论其对机器学习各领域神经网络可解释性的意义。