How do you scale a machine learning product at a startup? In particular, how do you serve a greater volume, velocity, and variety of queries cost-effectively? We break down costs into variable costs-the cost of serving the model and performant-and fixed costs-the cost of developing and training new models. We propose a framework for conceptualizing these costs, breaking them into finer categories, and limn ways to reduce costs. Lastly, since in our experience, the most expensive fixed cost of a machine learning system is the cost of identifying the root causes of failures and driving continuous improvement, we present a way to conceptualize the issues and share our methodology for the same.
翻译:如何在一家初创公司中规模化机器学习产品?特别是,如何以成本有效的方式服务更大规模、更高速度和更丰富多样的查询?我们将成本分解为可变成本(模型服务与性能成本)和固定成本(新模型开发与训练成本)。我们提出一个框架来概念化这些成本,将其细分为更具体的类别,并描绘降低成本的方法。最后,基于我们的经验,机器学习系统中最昂贵的固定成本是识别故障根本原因并推动持续改进的成本,因此我们提出一种概念化问题的方法,并分享我们相应的解决策略。