In this paper we present the first steps towards the creation of a tool which enables artists to create music visualizations using pre-trained, generative, machine learning models. First, we investigate the application of network bending, the process of applying transforms within the layers of a generative network, to image generation diffusion models by utilizing a range of point-wise, tensor-wise, and morphological operators. We identify a number of visual effects that result from various operators, including some that are not easily recreated with standard image editing tools. We find that this process allows for continuous, fine-grain control of image generation which can be helpful for creative applications. Next, we generate music-reactive videos using Stable Diffusion by passing audio features as parameters to network bending operators. Finally, we comment on certain transforms which radically shift the image and the possibilities of learning more about the latent space of Stable Diffusion based on these transforms.
翻译:本文提出了构建一种工具的首步探索,该工具使艺术家能够利用预训练的生成式机器学习模型创作音乐可视化作品。首先,我们研究了网络弯曲(即在生成网络各层间施加变换的过程)在图像生成扩散模型中的应用,通过采用一系列逐点、张量及形态学操作符。我们识别出多种由不同操作符产生的视觉效果,其中部分效果难以通过标准图像编辑工具实现。研究发现,该过程允许对图像生成进行连续、细粒度的控制,这对创意应用具有重要价值。随后,我们通过将音频特征作为参数传递给网络弯曲操作符,利用Stable Diffusion生成音乐响应式视频。最后,我们探讨了某些能彻底改变图像的变换方式,并基于这些变换提出了进一步探索Stable Diffusion潜在空间的可能性。