When deploying machine learning models in production for any product/application, there are three properties that are commonly desired. First, the models should be generalizable, in that we can extend it to further use cases as our knowledge of the domain area develops. Second they should be evaluable, so that there are clear metrics for performance and the calculation of those metrics in production settings are feasible. Finally, the deployment should be cost-optimal as far as possible. In this paper we propose that these three objectives (i.e. generalization, evaluation and cost-optimality) can often be relatively orthogonal and that for large language models, despite their performance over conventional NLP models, enterprises need to carefully assess all the three factors before making substantial investments in this technology. We propose a framework for generalization, evaluation and cost-modeling specifically tailored to large language models, offering insights into the intricacies of development, deployment and management for these large language models.
翻译:在产品或应用的生产环境中部署机器学习模型时,通常期望模型具备三个性质。首先,模型应具有泛化能力,能够随着我们对领域知识的深入而扩展到更多用例。其次,模型应具有可评估性,即有清晰的性能指标,并且这些指标在生产环境中的计算是可行的。最后,部署应尽可能实现成本最优。本文提出,这三个目标(即泛化性、可评估性和成本最优性)往往相对独立,且对于大型语言模型而言,尽管其性能优于传统自然语言处理模型,企业在对此技术进行重大投资前仍需仔细评估这三个因素。我们提出了一个专门针对大型语言模型的泛化、评估与成本建模框架,揭示了这些大型语言模型在开发、部署与管理中的复杂细节。