Synthesizers are powerful tools that allow musicians to create dynamic and original sounds. Existing commercial interfaces for synthesizers typically require musicians to interact with complex low-level parameters or to manage large libraries of premade sounds. To address these challenges, we implement SynthScribe -- a fullstack system that uses multimodal deep learning to let users express their intentions at a much higher level. We implement features which address a number of difficulties, namely 1) searching through existing sounds, 2) creating completely new sounds, 3) making meaningful modifications to a given sound. This is achieved with three main features: a multimodal search engine for a large library of synthesizer sounds; a user centered genetic algorithm by which completely new sounds can be created and selected given the users preferences; a sound editing support feature which highlights and gives examples for key control parameters with respect to a text or audio based query. The results of our user studies show SynthScribe is capable of reliably retrieving and modifying sounds while also affording the ability to create completely new sounds that expand a musicians creative horizon.
翻译:合成器是强大的工具,允许音乐家创作动态且原创的声音。现有的合成器商业界面通常要求音乐家与复杂的低层参数交互,或管理庞大的预制音色库。为应对这些挑战,我们实现了SynthScribe——一个利用多模态深度学习让用户以更高层次表达意图的全栈系统。我们实现了解决多项难题的功能,包括:1)搜索现有声音,2)创建全新声音,3)对给定声音进行有意义的修改。这通过三大主要功能实现:针对大型合成器音色库的多模态搜索引擎;基于用户偏好的、可创建并选择全新声音的以用户为中心的遗传算法;以及针对文本或音频查询突出并示例关键控制参数的声音编辑支持功能。用户研究结果表明,SynthScribe能够可靠地检索和修改声音,同时具备创建全新声音的能力,从而拓展音乐家的创作视野。