Physics-Constrained Denoising Autoencoders for Data-Scarce Wildfire UAV Sensing

Wildfire monitoring requires high-resolution atmospheric measurements, yet low-cost sensors on Unmanned Aerial Vehicles (UAVs) exhibit baseline drift, cross-sensitivity, and response lag that corrupt concentration estimates. Traditional deep learning denoising approaches demand large datasets impractical to obtain from limited UAV flight campaigns. We present PC$^2$DAE, a physics-informed denoising autoencoder that addresses data scarcity by embedding physical constraints directly into the network architecture. Non-negative concentration estimates are enforced via softplus activations and physically plausible temporal smoothing, ensuring outputs are physically admissible by construction rather than relying on loss function penalties. The architecture employs hierarchical decoder heads for Black Carbon, Gas, and CO$_2$ sensor families, with two variants: PC$^2$DAE-Lean (21k parameters) for edge deployment and PC$^2$DAE-Wide (204k parameters) for offline processing. We evaluate on 7,894 synchronized 1 Hz samples collected from UAV flights during prescribed burns in Saskatchewan, Canada (approximately 2.2 hours of flight data), two orders of magnitude below typical deep learning requirements. PC$^2$DAE-Lean achieves 67.3\% smoothness improvement and 90.7\% high-frequency noise reduction with zero physics violations. Five baselines (LSTM-AE, U-Net, Transformer, CBDAE, DeSpaWN) produce 15--23\% negative outputs. The lean variant outperforms wide (+5.6\% smoothness), suggesting reduced capacity with strong inductive bias prevents overfitting in data-scarce regimes. Training completes in under 65 seconds on consumer hardware.

翻译：野火监测需要高分辨率的大气测量，然而无人飞行器上的低成本传感器存在基线漂移、交叉敏感性和响应滞后等问题，这会污染浓度估计值。传统的深度学习去噪方法需要大量数据集，这在有限的无人机飞行活动中难以获取。我们提出了PC$^2$DAE，一种物理信息嵌入的去噪自编码器，通过将物理约束直接嵌入网络架构来解决数据稀缺问题。非负浓度估计通过softplus激活函数和物理上合理的时间平滑来强制实施，确保输出在构造上就是物理可接受的，而非依赖于损失函数的惩罚。该架构采用分层解码器头分别处理黑碳、气体和CO$_2$传感器家族，并提供两种变体：PC$^2$DAE-Lean（2.1万个参数）用于边缘部署，PC$^2$DAE-Wide（20.4万个参数）用于离线处理。我们在加拿大萨斯喀彻温省计划烧除期间，从无人机飞行收集的7,894个同步1赫兹样本（约2.2小时飞行数据）上进行了评估，数据量比典型的深度学习要求低两个数量级。PC$^2$DAE-Lean实现了67.3%的平滑度提升和90.7%的高频噪声抑制，且零物理违规。五个基线模型（LSTM-AE、U-Net、Transformer、CBDAE、DeSpaWN）产生了15-23%的负输出。精简变体优于宽泛变体（平滑度+5.6%），这表明在强归纳偏置下降低模型容量可以防止在数据稀缺场景中的过拟合。在消费级硬件上训练可在65秒内完成。