Predicting Wind-Driven Spatial Deposition through Simulated Color Images using Deep Autoencoders

For centuries, scientists have observed nature to understand the laws that govern the physical world. The traditional process of turning observations into physical understanding is slow. Imperfect models are constructed and tested to explain relationships in data. Powerful new algorithms can enable computers to learn physics by observing images and videos. Inspired by this idea, instead of training machine learning models using physical quantities, we used images, that is, pixel information. For this work, and as a proof of concept, the physics of interest are wind-driven spatial patterns. These phenomena include features in Aeolian dunes and volcanic ash deposition, wildfire smoke, and air pollution plumes. We use computer model simulations of spatial deposition patterns to approximate images from a hypothetical imaging device whose outputs are red, green, and blue (RGB) color images with channel values ranging from 0 to 255. In this paper, we explore deep convolutional neural network-based autoencoders to exploit relationships in wind-driven spatial patterns, which commonly occur in geosciences, and reduce their dimensionality. Reducing the data dimension size with an encoder enables training deep, fully connected neural network models linking geographic and meteorological scalar input quantities to the encoded space. Once this is achieved, full spatial patterns are reconstructed using the decoder. We demonstrate this approach on images of spatial deposition from a pollution source, where the encoder compresses the dimensionality to 0.02% of the original size, and the full predictive model performance on test data achieves a normalized root mean squared error of 8%, a figure of merit in space of 94% and a precision-recall area under the curve of 0.93.

翻译：几个世纪以来，科学家们通过观察自然来理解支配物理世界的规律。传统上将观测转化为物理理解的过程十分缓慢——需要构建并测试不完美的模型来解释数据中的关系。强大的新算法使计算机能够通过观察图像和视频来学习物理学原理。受此启发，本研究不使用物理量来训练机器学习模型，而是直接采用图像（即像素信息）作为输入。作为概念验证，本文关注的风驱空间模式包括风成沙丘、火山灰沉积、野火烟雾和空气污染羽流等特征。我们利用空间沉积模式的计算机模拟，近似模拟某假想成像设备的输出——其输出为红绿蓝（RGB）三通道彩色图像，通道值范围为0-255。本文探索基于深度卷积神经网络的自编码器，用于挖掘地球科学中常见的风驱空间模式间的关系并降低其维度。通过编码器降低数据维度后，即可训练深度全连接神经网络模型，将地理与气象标量输入映射到编码空间。完成映射后，再利用解码器重建完整空间模式。我们在某污染源空间沉积图像上验证了该方法：编码器将维度压缩至原始尺寸的0.02%，全预测模型在测试数据上的归一化均方根误差为8%，空间品质因数达94%，精确率-召回率曲线下面积为0.93。