Autoregressive language models typically use temperature parameter at inference to shape the probability distribution and control the randomness of the text generated. After the text was generated, this parameter can be estimated using maximum likelihood approach. Following it, we propose a procedure to estimate the temperature of any text, including ones written by humans, with respect to a given language model. We evaluate the temperature estimation capability of a wide selection of small-to-medium Large Language Models (LLMs). We then use the best-performing Qwen3 14B to estimate temperatures of popular corpora, finding that while most measured temperatures are close to 1, notable exceptions include Jokes, GSM8K, and AG News (1.1), and Python code (0.9).
翻译:自回归语言模型在推理时通常使用温度参数来调整概率分布并控制生成文本的随机性。文本生成后,可通过最大似然方法估计该参数。基于此,我们提出一种针对给定语言模型估计任意文本(包括人类撰写的文本)温度的程序。我们评估了多种中小型大语言模型(LLMs)的温度估计能力,随后使用表现最优的Qwen3 14B模型对常见语料库进行温度估计。研究发现,虽然多数语料的测量温度值接近1,但存在显著例外:Jokes、GSM8K和AG News语料的温度约为1.1,而Python代码的温度约为0.9。