As large language models attract increasing attention and find widespread application, concurrent challenges of reliability also arise at the same time. Confidence calibration, an effective analysis method for gauging the reliability of deep models, serves as a crucial tool for assessing and improving their reliability. However, such investigation has been comparatively underexplored. In this work, we conduct a systematic examination of the calibration of aligned language models throughout the entire construction process, including pretraining and alignment training. At each stage, we investigate how different training settings, such as parameter scales and training data, affect model calibration. To thoroughly assess model calibration, we evaluate models on three most concerned aspects: generation, factuality and understanding. Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.
翻译:随着大型语言模型日益受到关注并得到广泛应用,其可靠性挑战也同步显现。置信度校准作为评估深度模型可靠性的有效分析方法,是衡量并提升模型可靠性的关键工具。然而,目前该领域的研究仍相对不足。本研究系统性地考察了对齐语言模型在完整构建过程中(包括预训练和对齐训练)的校准情况。在每个阶段,我们探究了参数规模、训练数据等不同训练设置对模型校准的影响。为全面评估模型校准效果,我们从生成能力、事实性和理解能力三个最受关注的维度开展评估。本研究揭示了当前主流大型语言模型是否具备良好校准特性,以及训练过程如何影响模型校准效果。