Engine sounds originate from sequential exhaust pressure pulses rather than sustained harmonic oscillations. While neural synthesis methods typically aim to approximate the resulting spectral characteristics, we propose directly modeling the underlying pulse shapes and temporal structure. We present the Pulse-Train-Resonator (PTR) model, a differentiable synthesis architecture that generates engine audio as parameterized pulse trains aligned to engine firing patterns and propagates them through recursive Karplus-Strong resonators simulating exhaust acoustics. The architecture integrates physics-informed inductive biases including harmonic decay, thermodynamic pitch modulation, valve-dynamics envelopes, exhaust system resonances and derived engine operating modes such as throttle operation and Deceleration Fuel Cutoff (DFCO). Validated on three diverse engine types totaling 7.5 hours of audio, PTR achieves a 21% improvement in harmonic reconstruction and a 5.7% reduction in total loss over a harmonic-plus-noise baseline model, while providing interpretable parameters corresponding to physical phenomena. Complete code, model weights, and audio examples are openly available.
翻译:引擎声音源于连续的排气压力脉冲,而非持续的谐波振荡。虽然神经合成方法通常旨在近似最终的光谱特征,但我们提出直接对底层脉冲形状和时间结构进行建模。我们提出了脉冲串-谐振器(PTR)模型,这是一种可微分合成架构,可生成与引擎点火模式对齐的参数化脉冲串形式的引擎音频,并将这些脉冲通过模拟排气声学特性的递归Karplus-Strong谐振器传播。该架构集成了物理信息引导的归纳偏置,包括谐波衰减、热力学音调调制、气门动力学包络、排气系统共振以及派生的引擎运行模式,如节气门操作和减速断油(DFCO)。在三种不同类型引擎(总计7.5小时音频)上的验证表明,与谐波加噪声基线模型相比,PTR在谐波重建方面提升了21%,总损失降低了5.7%,同时提供了与物理现象相对应的可解释参数。完整代码、模型权重和音频示例均已公开提供。