Machine Learning Systems are Bloated and Vulnerable

Today's software is bloated with both code and features that are not used by most users. This bloat is prevalent across the entire software stack, from the operating system, all the way to software backends, frontends, and web-pages. In this paper, we focus on analyzing and quantifying bloat in machine learning containers. We develop MMLB, a framework to analyze bloat in machine learning containers, measuring the amount of bloat that exists on the container and package levels. Our tool quantifies the sources of bloat and integrates with vulnerability analysis tools to evaluate the impact of bloat on container vulnerabilities. Through experimentation with 15 machine learning containers from Tensorflow, Pytorch, and NVIDIA, we show that bloat is a significant issue, accounting for up to 80% of the container size in some cases. Our results demonstrate that bloat significantly increases the container provisioning time by up to 370% and exacerbates vulnerabilities by up to 99%.

翻译：当今的软件充斥着大多数用户从未使用的代码和功能。这种臃肿现象遍及整个软件栈，从操作系统到软件后端、前端乃至网页。本文聚焦于分析和量化机器学习容器中的臃肿问题。我们开发了MMLB框架，用于分析机器学习容器的臃肿程度，测量容器与包层面的臃肿量。该工具量化了臃肿的来源，并与漏洞分析工具集成，以评估臃肿对容器漏洞的影响。通过对TensorFlow、PyTorch和NVIDIA的15个机器学习容器进行实验，我们证明臃肿是一个显著问题，在某些情况下占容器大小的80%。结果表明，臃肿使容器预配时间最多增加370%，且漏洞风险加剧高达99%。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日