Advances in artificial intelligence (AI) and deep learning have led to neural networks being used to generate lightning-speed answers to complex science questions, paintings in the style of Monet, or stories like those of Twain. Leveraging their computational speed and flexibility, neural networks are also being used to facilitate fast, likelihood-free statistical inference. However, it is not straightforward to use neural networks with data that for various reasons are incomplete, which precludes their use in many applications. A recently proposed approach to remedy this issue uses an appropriately padded data vector and a vector that encodes the missingness pattern as input to a neural network. While computationally efficient, this "masking" approach is not robust to the missingness mechanism and can result in statistically inefficient inferences. Here, we propose an alternative approach that is based on the Monte Carlo expectation-maximization (EM) algorithm. Our EM approach is likelihood-free, substantially faster than the conventional EM algorithm as it does not require numerical optimization at each iteration, and more statistically efficient than the masking approach. This research addresses a prototypical problem that asks how improvements could be made in AI by introducing Bayesian statistical thinking. We compare the two approaches to missingness using simulated incomplete data from a variety of spatial models. The utility of the methodology is shown on Arctic sea-ice data, analyzed using a novel hidden Potts model with an intractable likelihood.
翻译:人工智能(AI)与深度学习的进展使得神经网络能够以闪电般的速度回答复杂的科学问题、生成莫奈风格的画作或创作类似马克·吐温风格的故事。凭借其计算速度与灵活性,神经网络也被用于实现快速、无需似然性的统计推断。然而,由于各种原因导致数据不完整时,直接使用神经网络并不简单,这限制了其在许多应用中的使用。近期提出的一种解决方案采用适当填充的数据向量与编码缺失模式的向量作为神经网络的输入。尽管这种"掩码"方法计算效率高,但对缺失机制缺乏鲁棒性,可能导致统计推断效率低下。本文提出一种基于蒙特卡洛期望最大化(EM)算法的替代方法。我们的EM方法无需似然函数,相比传统EM算法无需在每次迭代中进行数值优化而大幅提速,且统计效率高于掩码方法。本研究通过引入贝叶斯统计思想探讨如何改进AI,针对一个典型问题展开研究。我们使用多种空间模型生成模拟不完整数据,对两种缺失数据处理方法进行比较。该方法在北极海冰数据上的应用展示了其实用性,该分析采用了一种具有难解似然函数的新型隐式Potts模型。