We demonstrate that a single 3x3 convolutional kernel can produce emergent audio effects when trained on 200 samples from a personalized corpus. We achieve this through two key techniques: (1) Conditioning Aware Kernels (CAK), where output = input + (learned_pattern x control), with a soft-gate mechanism supporting identity preservation at zero control; and (2) AuGAN (Audit GAN), which reframes adversarial training from "is this real?" to "did you apply the requested value?" Rather than learning to generate or detect forgeries, our networks cooperate to verify control application, discovering unique transformations. The learned kernel exhibits a diagonal structure creating frequency-dependent temporal shifts that are capable of producing musical effects based on input characteristics. Our results show the potential of adversarial training to discover audio transformations from minimal data, enabling new approaches to effect design.
翻译:我们证明,单个3x3卷积核在基于个性化语料库的200个样本训练后,能够产生涌现的音频效果。这一成果通过两项关键技术实现:(1) 条件感知核(CAK),其输出 = 输入 + (学习模式 × 控制参数),并采用软门控机制确保控制参数为零时的恒等保持;(2) 审计生成对抗网络(AuGAN),该网络将对抗训练的目标从“这是真实的吗?”重构为“是否应用了请求的控制值?”。我们的网络不再学习生成或检测伪造数据,而是通过协作验证控制参数的应用过程,从而发现独特的音频变换。学习得到的卷积核呈现出对角结构,能够产生频率依赖的时序偏移,这种偏移可根据输入特征生成音乐效果。研究结果表明,对抗训练具有从极小数据量中发现音频变换的潜力,为效果设计开辟了新途径。