Natural language is commonly used to describe instrument timbre, such as a "warm" or "heavy" sound. As these descriptors are based on human perception, there can be disagreement over which acoustic features correspond to a given adjective. In this work, we pursue a data-driven approach to further our understanding of such adjectives in the context of guitar tone. Our main contribution is a dataset of timbre adjectives, constructed by processing single clips of instrument audio to produce varied timbres through adjustments in EQ and effects such as distortion. Adjective annotations are obtained for each clip by crowdsourcing experts to complete a pairwise comparison and a labeling task. We examine the dataset and reveal correlations between adjective ratings and highlight instances where the data contradicts prevailing theories on spectral features and timbral adjectives, suggesting a need for a more nuanced, data-driven understanding of timbre.
翻译:自然语言常被用于描述乐器音色,例如“温暖”或“厚重”的声音。由于这些描述符基于人类感知,对于特定形容词对应哪些声学特征可能存在分歧。本研究采用数据驱动的方法,以吉他音色为背景,深化对此类形容词的理解。我们的主要贡献是构建了一个音色形容词数据集,其通过处理乐器音频的单段片段,并借助均衡器调整及失真等效果器来产生多样化的音色。每个片段的形容词标注通过众包专家完成配对比较和标注任务获得。我们对数据集进行了分析,揭示了形容词评分之间的相关性,并指出了数据与现有关于频谱特征和音色形容词的理论相矛盾的实例,这表明需要对音色进行更细致、数据驱动的理解。