语言模型所知而未言：面向泛化的非生成式先验提取 (What Language Models Know But Don't Say: Non-Generative Prior Extraction for Generalization)

In domains like medicine and finance, large-scale labeled data is costly and often unavailable, leading to models trained on small datasets that struggle to generalize to real-world populations. Large language models contain extensive knowledge from years of research across these domains. We propose LoID (Logit-Informed Distributions), a deterministic method for extracting informative prior distributions for Bayesian logistic regression by directly accessing their token-level predictions. Rather than relying on generated text, we probe the model's confidence in opposing semantic directions (positive vs. negative impact) through carefully constructed sentences. By measuring how consistently the LLM favors one direction across diverse phrasings, we extract the strength and reliability of the model's belief about each feature's influence. We evaluate LoID on ten real-world tabular datasets under synthetic out-of-distribution (OOD) settings characterized by covariate shift, where the training data represents only a subset of the population. We compare our approach against (1) standard uninformative priors, (2) AutoElicit, a recent method that prompts LLMs to generate priors via text completions, (3) LLMProcesses, a method that uses LLMs to generate numerical predictions through in-context learning and (4) an oracle-style upper bound derived from fitting logistic regression on the full dataset. We assess performance using Area Under the Curve (AUC). Across datasets, LoID significantly improves performance over logistic regression trained on OOD data, recovering up to \textbf{59\%} of the performance gap relative to the oracle model. LoID outperforms AutoElicit and LLMProcessesc on 8 out of 10 datasets, while providing a reproducible and computationally efficient mechanism for integrating LLM knowledge into Bayesian inference.

翻译：在医学和金融等领域，大规模标注数据成本高昂且往往难以获取，导致在小数据集上训练的模型难以泛化至真实世界群体。大型语言模型蕴含了这些领域多年研究积累的广泛知识。我们提出LoID（Logit-Informed Distributions），一种通过直接访问语言模型词元级预测来为贝叶斯逻辑回归提取信息性先验分布的确定性方法。与依赖生成文本不同，我们通过精心构建的句子探测模型在相反语义方向（积极影响 vs. 消极影响）上的置信度。通过测量LLM在不同表述中偏好某一方向的一致性程度，我们提取出模型对每个特征影响的信念强度与可靠性。我们在十个真实世界表格数据集上评估LoID，采用协变量偏移特征的合成分布外（OOD）设置，其中训练数据仅代表总体样本的子集。我们将本方法与以下基准进行比较：（1）标准无信息先验，（2）AutoElicit——近期提出的通过文本补全提示LLM生成先验的方法，（3）LLMProcesses——利用上下文学习通过LLM生成数值预测的方法，以及（4）在全数据集上拟合逻辑回归得到的类先知上界。我们使用曲线下面积（AUC）评估性能。在所有数据集中，LoID显著提升了在OOD数据上训练的逻辑回归性能，相对先知模型最多恢复了\textbf{59\%}的性能差距。LoID在10个数据集中的8个上优于AutoElicit和LLMProcesses，同时为将LLM知识整合到贝叶斯推断中提供了可复现且计算高效的机制。