Conventional audio equalization is a static process that requires manual and cumbersome adjustments to adapt to changing listening contexts (e.g., mood, location, or social setting). In this paper, we introduce a Large Language Model (LLM)-based alternative that maps natural language text prompts to equalization settings. This enables a conversational approach to sound system control. By utilizing data collected from a controlled listening experiment, our models exploit in-context learning and parameter-efficient fine-tuning techniques to reliably align with population-preferred equalization settings. Our evaluation methods, which leverage distributional metrics that capture users' varied preferences, show statistically significant improvements in distributional alignment over random sampling and static preset baselines. These results indicate that LLMs could function as "artificial equalizers," contributing to the development of more accessible, context-aware, and expert-level audio tuning methods.
翻译:传统的音频均衡是一个静态过程,需要手动进行繁琐的调整以适应不断变化的聆听环境(例如,情绪、地点或社交场合)。本文提出了一种基于大语言模型(LLM)的替代方案,可将自然语言文本提示映射为均衡器设置。这实现了一种对话式的音响系统控制方法。通过利用从受控听力实验中收集的数据,我们的模型利用上下文学习和参数高效微调技术,可靠地实现与群体偏好均衡设置的对齐。我们的评估方法采用了能够捕捉用户多样化偏好的分布度量,结果显示,与随机采样和静态预设基线相比,在分布对齐方面取得了统计学上的显著改进。这些结果表明,大语言模型可以充当“人工均衡器”,有助于开发更易用、更具情境感知能力且达到专家水平的音频调校方法。