We present a method to integrate Large Language Models (LLMs) and traditional tabular data classification techniques, addressing LLMs challenges like data serialization sensitivity and biases. We introduce two strategies utilizing LLMs for ranking categorical variables and generating priors on correlations between continuous variables and targets, enhancing performance in few-shot scenarios. We focus on Logistic Regression, introducing MonotonicLR that employs a non-linear monotonic function for mapping ordinals to cardinals while preserving LLM-determined orders. Validation against baseline models reveals the superior performance of our approach, especially in low-data scenarios, while remaining interpretable.
翻译:我们提出了一种将大型语言模型与传统表格数据分类技术相结合的方法,以解决LLMs在数据序列化敏感性和偏差等方面的挑战。我们引入了两种策略,利用LLMs对分类变量进行排序,并为连续变量与目标之间的相关性生成先验知识,从而提升少样本场景下的性能。我们聚焦于逻辑回归,提出了MonotonicLR方法,该方法采用非线性单调函数将序数映射为基数,同时保留LLM确定的顺序。与基线模型的验证表明,我们的方法在低数据场景下性能优越,且保持可解释性。