Visuals are a core part of our experience of music, owing to the way they can amplify the emotions and messages conveyed through the music. However, creating music visualization is a complex, time-consuming, and resource-intensive process. We introduce Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-image models. Users select intervals of music to visualize and then parameterize that visualization by defining start and end prompts. These prompts are warped between and generated according to the beat of the music for audioreactive video. We introduce design patterns for improving generated videos: "transitions", which express shifts in color, time, subject, or style, and "holds", which encourage visual emphasis and consistency. A study with professionals showed that the system was enjoyable, easy to explore, and highly expressive. We conclude on use cases of Generative Disco for professionals and how AI-generated content is changing the landscape of creative work.
翻译:视觉是我们体验音乐的核心部分,因为它们能放大音乐所传达的情感和信息。然而,创建音乐可视化是一个复杂、耗时且资源密集的过程。我们引入了生成式迪斯科,这是一个生成式AI系统,利用大型语言模型和文本到图像模型帮助生成音乐可视化。用户选择要可视化的音乐片段,然后通过定义开始和结束提示来参数化该可视化。这些提示之间进行扭曲处理,并根据音乐的节拍生成音频响应视频。我们引入了改进生成视频的设计模式:“过渡”,用于表达色彩、时间、主体或风格的变化,以及“保持”,用于鼓励视觉重点和一致性。一项针对专业人士的研究表明,该系统令人愉悦、易于探索且表现力极强。最后,我们总结了生成式迪斯科在专业人士中的使用案例,以及AI生成内容如何改变创意工作的格局。