We explore the use of neural synthesis for acoustic guitar from string-wise MIDI input. We propose four different systems and compare them with both objective metrics and subjective evaluation against natural audio and a sample-based baseline. We iteratively develop these four systems by making various considerations on the architecture and intermediate tasks, such as predicting pitch and loudness control features. We find that formulating the control feature prediction task as a classification task rather than a regression task yields better results. Furthermore, we find that our simplest proposed system, which directly predicts synthesis parameters from MIDI input performs the best out of the four proposed systems. Audio examples are available at https://erl-j.github.io/neural-guitar-web-supplement.
翻译:我们探索从弦级MIDI输入进行原声吉他神经合成的方法。提出四种不同系统,通过客观指标和主观评价,与自然音频及基于采样的基线进行对比。通过架构与中间任务(如预测音高和响度控制特征)的多方面考量,迭代开发这四个系统。研究发现,将控制特征预测任务制定为分类任务而非回归任务能取得更优效果。此外,在提出的四个系统中,直接从MIDI输入预测合成参数的最简方案表现最佳。音频示例见 https://erl-j.github.io/neural-guitar-web-supplement。