Recent decades have seen the rise of large-scale Deep Neural Networks (DNNs) to achieve human-competitive performance in a variety of artificial intelligence tasks. Often consisting of hundreds of millions, if not hundreds of billion parameters, these DNNs are too large to be deployed to, or efficiently run on resource-constrained devices such as mobile phones or IoT microcontrollers. Systems relying on large-scale DNNs thus have to call the corresponding model over the network, leading to substantial costs for hosting and running the large-scale remote model, costs which are often charged on a per-use basis. In this paper, we propose BiSupervised, a novel architecture, where, before relying on a large remote DNN, a system attempts to make a prediction on a small-scale local model. A DNN supervisor monitors said prediction process and identifies easy inputs for which the local prediction can be trusted. For these inputs, the remote model does not have to be invoked, thus saving costs, while only marginally impacting the overall system accuracy. Our architecture furthermore foresees a second supervisor to monitor the remote predictions and identify inputs for which not even these can be trusted, allowing to raise an exception or run a fallback strategy instead. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies: IMDB movie review sentiment classification, Github issue triaging, Imagenet image classification, and SQuADv2 free-text question answering
翻译:近几十年来,大规模深度神经网络(DNN)在多种人工智能任务中实现了与人类竞争的性能。这些DNN通常拥有数亿甚至数千亿参数,规模过大而无法部署到资源受限设备(如手机或物联网微控制器)上,也难以高效运行。因此,依赖大规模DNN的系统需通过网络调用相应模型,导致托管和运行大规模远程模型的巨额成本——此类成本通常按使用次数计费。本文提出新型架构BiSupervised:系统在依赖大型远程DNN前,首先尝试使用小型本地模型进行预测。一个DNN监督器监控该预测过程,识别可信任本地预测结果的简单输入。对于这些输入,无需调用远程模型,从而节省成本且仅对整体系统准确性产生微小影响。此外,我们的架构设置了第二个监督器来监控远程预测,识别连远程预测结果都不可信的输入,从而允许抛出异常或执行备用策略。我们通过四个不同案例评估成本节省与错误预测检测能力:IMDB电影评论情感分类、Github问题分类、ImageNet图像分类以及SQuADv2自由文本问答。