In this work, we report our efforts to advance the standard operation procedure of developing Large Language Models (LLMs) or LLMs-based systems or services in industry. We introduce the concept of Large Language Model Development Lifecycle (LDLC) and then highlight the importance of consistency test in ensuring the delivery quality. The principled solution of consistency test, however, is usually overlooked by industrial practitioners and not urgent in academia, and current practical solutions are insufficiently rigours and labor-intensive. We thus propose a simple yet effective consistency test protocol, named SimCT. SimCT is mainly to proactively check the consistency across different development stages of "bare metal" LLMs or associated services without accessing the model artifacts, in an attempt to expedite the delivery by reducing the back-and-forth alignment communications among multiple teams involved in different development stages. Specifically, SimCT encompasses response-wise and model-wise tests. We implement the protocol with LightGBM and Student's t-test for two components respectively, and perform extensive experiments to substantiate the effectiveness of SimCT and the involved components.
翻译:在本工作中,我们报告了为推进工业界开发大型语言模型(LLMs)、基于LLMs的系统或服务的标准操作流程所做的努力。我们引入了大型语言模型开发生命周期(LDLC)的概念,并强调了在确保交付质量中一致性测试的重要性。然而,一致性测试的原则性解决方案通常被工业界实践者所忽视,在学术界也非紧迫议题,且当前的实际解决方案严谨性不足且劳动密集。因此,我们提出了一种简单而有效的一致性测试协议,命名为SimCT。SimCT主要用于主动检查“裸机”LLMs或其关联服务在不同开发阶段之间的一致性,而无需访问模型工件,旨在通过减少参与不同开发阶段的多个团队之间反复对齐的沟通,从而加速交付。具体而言,SimCT包含响应层面和模型层面的测试。我们分别使用LightGBM和Student's t-test实现了该协议的两个组件,并进行了大量实验以证实SimCT及其所涉组件的有效性。