Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion

Many machine learning systems have access to multiple sources of evidence for the same prediction target, yet these sources often differ in reliability and informativeness across inputs. In bioacoustic classification, species identity may be inferred both from the acoustic signal and from spatiotemporal context such as location and season; while Bayesian inference motivates multiplicative evidence combination, in practice we typically only have access to discriminative predictors rather than calibrated generative models. We introduce \textbf{F}usion under \textbf{IN}dependent \textbf{C}onditional \textbf{H}ypotheses (\textbf{FINCH}), an adaptive log-linear evidence fusion framework that integrates a pre-trained audio classifier with a structured spatiotemporal predictor. FINCH learns a per-sample gating function that estimates the reliability of contextual information from uncertainty and informativeness statistics. The resulting fusion family \emph{contains} the audio-only classifier as a special case and explicitly bounds the influence of contextual evidence, yielding a risk-contained hypothesis class with an interpretable audio-only fallback. Across benchmarks, FINCH consistently outperforms fixed-weight fusion and audio-only baselines, improving robustness and error trade-offs even when contextual information is weak in isolation. We achieve state-of-the-art performance on CBI and competitive or improved performance on several subsets of BirdSet using a lightweight, interpretable, evidence-based approach. Code is available: \texttt{\href{https://anonymous.4open.science/r/birdnoise-85CD/README.md}{anonymous-repository}}

翻译：许多机器学习系统能够获取同一预测目标的多种证据来源，但这些来源在不同输入中的可靠性和信息量往往存在差异。在生物声学分类中，物种身份既可从声学信号推断，也可通过时空上下文（如地理位置和季节）推断；虽然贝叶斯推理支持证据的乘法组合，但在实践中我们通常只能使用判别式预测器而非经过校准的生成模型。本文提出\textbf{F}usion under \textbf{IN}dependent \textbf{C}onditional \textbf{H}ypotheses (\textbf{FINCH})——一种自适应对数线性证据融合框架，该框架将预训练的音频分类器与结构化时空预测器相集成。FINCH通过学习样本级门控函数，从不确定性和信息量统计量中估计上下文信息的可靠性。所得融合族将纯音频分类器作为特例包含其中，并显式约束上下文证据的影响范围，从而形成一个具有可解释纯音频回退机制的风险可控假设类。在多个基准测试中，FINCH始终优于固定权重融合和纯音频基线模型，即使在上下文信息单独作用较弱时仍能提升鲁棒性并改善误差权衡。我们通过轻量化、可解释、基于证据的方法，在CBI数据集上取得了最先进的性能，并在BirdSet的多个子集上获得竞争性或改进的性能。代码已开源：\texttt{\href{https://anonymous.4open.science/r/birdnoise-85CD/README.md}{anonymous-repository}}