Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation

Deep learning architectures have achieved state-of-the-art (SOTA) performance on computer vision tasks such as object detection and image segmentation. This may be attributed to the use of over-parameterized, monolithic deep learning architectures executed on large datasets. Although such architectures lead to increased accuracy, this is usually accompanied by a large increase in computation and memory requirements during inference. While this is a non-issue in traditional machine learning pipelines, the recent confluence of machine learning and fields like the Internet of Things has rendered such large architectures infeasible for execution in low-resource settings. In such settings, previous efforts have proposed decision cascades where inputs are passed through models of increasing complexity until desired performance is achieved. However, we argue that cascaded prediction leads to increased computational cost due to wasteful intermediate computations. To address this, we propose PaSeR (Parsimonious Segmentation with Reinforcement Learning) a non-cascading, cost-aware learning pipeline as an alternative to cascaded architectures. Through experimental evaluation on real-world and standard datasets, we demonstrate that PaSeR achieves better accuracy while minimizing computational cost relative to cascaded models. Further, we introduce a new metric IoU/GigaFlop to evaluate the balance between cost and performance. On the real-world task of battery material phase segmentation, PaSeR yields a minimum performance improvement of 174% on the IoU/GigaFlop metric with respect to baselines. We also demonstrate PaSeR's adaptability to complementary models trained on a noisy MNIST dataset, where it achieved a minimum performance improvement on IoU/GigaFlop of 13.4% over SOTA models. Code and data are available at https://github.com/scailab/paser .

翻译：深度学习架构在目标检测和图像分割等计算机视觉任务中已实现最先进的性能。这归因于在大型数据集上执行过度参数化的整体式深度学习架构。尽管此类架构提高了准确率，但这通常伴随着推理过程中计算量和内存需求的大幅增加。虽然在传统机器学习流程中这并非问题，但近期机器学习与物联网等领域的融合使得此类大型架构在低资源环境下难以执行。在此类场景中，先前研究提出决策级联方法，即通过逐步增加复杂度的模型处理输入，直至达到预期性能。然而，我们认为级联预测因冗余的中间计算而导致计算成本增加。为解决该问题，我们提出PaSeR（基于强化学习的简约分割），一种作为级联架构替代方案的非级联、成本感知学习流程。通过在真实场景和标准数据集上的实验评估，我们证明PaSeR在最小化计算成本的同时，相比级联模型实现了更高精度。此外，我们引入新指标IoU/GigaFlop以评估成本与性能之间的平衡。在电池材料相分割的真实场景任务中，PaSeR在IoU/GigaFlop指标上相较基线方法至少提升174%。我们还展示了PaSeR对基于噪声MNIST数据集训练的互补模型的适应性，其在IoU/GigaFlop指标上相较最先进模型至少提升13.4%。代码与数据详见https://github.com/scailab/paser。