Large Language Models (LLMs) have shown exceptional performance in text processing. Notably, LLMs can synthesize information from large datasets and explain their decisions similarly to human reasoning through a chain of thought (CoT). An emerging application of LLMs is the handling and interpreting of numerical data, where fine-tuning enhances their performance over basic inference methods. This paper proposes a novel approach to training LLMs using knowledge transfer from a random forest (RF) ensemble, leveraging its efficiency and accuracy. By converting RF decision paths into natural language statements, we generate outputs for LLM fine-tuning, enhancing the model's ability to classify and explain its decisions. Our method includes verifying these rules through established classification metrics, ensuring their correctness. We also examine the impact of preprocessing techniques on the representation of numerical data and their influence on classification accuracy and rule correctness
翻译:大语言模型(LLMs)在文本处理领域展现出卓越性能。值得注意的是,LLMs能够综合大型数据集中的信息,并通过思维链(CoT)以类人的推理方式解释其决策过程。LLMs的一个新兴应用是数值数据的处理与解释,其中微调技术能显著提升其性能,超越基础推理方法。本文提出一种创新方法,通过从随机森林(RF)集成模型中迁移知识来训练LLMs,充分利用随机森林的高效性与准确性。通过将RF决策路径转化为自然语言陈述,我们生成用于LLM微调的输出数据,从而增强模型的分类与决策解释能力。我们的方法包含通过成熟的分类指标验证这些规则,确保其正确性。同时,我们研究了预处理技术对数值数据表示的影响,及其对分类精度与规则正确性的作用。