Evaluating generative spatial audio for First-Order Ambisonics (FOA) remains challenging due to a limited understanding of how metrics respond to changes in spatial parameters such as azimuth and elevation. We propose a framework to analyze metric sensitivity along continuous spatial trajectories, drawing on principles of sensitivity analysis in parametric sound synthesis. Using controlled FOA scenes with increasing scene complexity, we define three desiderata for metric behavior: Responsiveness, Smoothness, and Symmetry. We assess standard distribution-based and sample-based metrics, including Fréchet Audio Distance (FAD), intensity vectors, and acoustic maps. Our findings show that FAD using localization-specific embeddings and acoustic maps yield high Responsiveness and robust Smoothness and Symmetry across conditions, while intensity vectors degrade with increasing scene complexity. This is the first step towards investigating the sensitivity of metrics for generative spatial audio.
翻译:评估一阶高保真立体声(First-Order Ambisonics, FOA)的生成式空间音频仍具挑战性,原因在于对指标如何响应方位角与仰角等空间参数变化的认知有限。我们基于参数化声音合成中的敏感性分析原理,提出一个沿连续空间轨迹分析指标敏感性的框架。通过使用复杂度递增的受控FOA场景,我们定义了指标行为的三个期望特性:响应性、平滑性与对称性。我们评估了标准分布型与样本型指标,包括弗雷歇音频距离(Fréchet Audio Distance, FAD)、强度向量与声学图谱。研究结果表明,采用局部特定嵌入的FAD与声学图谱在所有条件下均展现出高响应性、稳健的平滑性与对称性,而强度向量则随场景复杂度增加而性能下降。这是探究生成式空间音频指标敏感性的初步工作。