We present SoundPlot, an open-source framework for analyzing avian vocalizations through acoustic feature extraction, dimensionality reduction, and neural audio synthesis. The system transforms audio signals into a multi-dimensional acoustic feature space, enabling real-time visualization of temporal dynamics in 3D using web-based interactive graphics. Our framework implements a complete analysis-synthesis pipeline that extracts spectral features (centroid, bandwidth, contrast), pitch contours via probabilistic YIN (pYIN), and mel-frequency cepstral coefficients (MFCCs), mapping them to a unified timbre space for visualization. Audio reconstruction employs the Griffin-Lim phase estimation algorithm applied to mel spectrograms. The accompanying Three.js-based interface provides dual-viewport visualization comparing original and synthesized audio trajectories with independent playback controls. We demonstrate the framework's capabilities through comprehensive waveform analysis, spectrogram comparisons, and feature space evaluation using Principal Component Analysis (PCA). Quantitative evaluation shows mel spectrogram correlation scores exceeding 0.92, indicating high-fidelity preservation of perceptual acoustic structure. SoundPlot is released under the MIT License to facilitate research in bioacoustics, audio signal processing, and computational ethology.
翻译:本文提出SoundPlot,一个通过声学特征提取、降维与神经音频合成分析鸟类鸣声的开源框架。该系统将音频信号转换为多维声学特征空间,利用基于网络的交互式图形实现时域动态的三维实时可视化。本框架实现了完整的分析-合成流程:提取频谱特征(质心、带宽、对比度)、通过概率YIN算法(pYIN)获取基频轮廓及梅尔频率倒谱系数(MFCCs),并将其映射至统一的音色空间进行可视化。音频重建采用应用于梅尔频谱图的Griffin-Lim相位估计算法。配套的基于Three.js的交互界面提供双视口可视化功能,可对比原始音频与合成音频的轨迹,并配备独立播放控制。我们通过完整的波形分析、频谱图对比以及基于主成分分析(PCA)的特征空间评估,展示了该框架的性能。定量评估显示梅尔频谱图相关性分数超过0.92,表明感知声学结构的高保真度得以保持。SoundPlot基于MIT许可证发布,以促进生物声学、音频信号处理及计算行为学领域的研究。