We present a study on the integration of Large Language Models (LLMs) in tabular data classification, emphasizing an efficient framework. Building upon existing work done in TabLLM (arXiv:2210.10723), we introduce three novel serialization techniques, including the standout LaTeX serialization method. This method significantly boosts the performance of LLMs in processing domain-specific datasets, Our method stands out for its memory efficiency and ability to fully utilize complex data structures. Through extensive experimentation, including various serialization approaches like feature combination and importance, we demonstrate our work's superiority in accuracy and efficiency over traditional models.
翻译:我们提出了一项关于将大语言模型(LLMs)集成到表格数据分类中的研究,重点在于构建高效框架。基于TabLLM(arXiv:2210.10723)的现有工作,我们引入了三种新型序列化技术,其中包括突出的LaTeX序列化方法。该方法显著提升了LLMs在特定领域数据集上的处理性能,其优势在于内存效率高且能充分利用复杂数据结构。通过大量对比实验,涵盖特征组合与重要性等多种序列化方法,我们证明了该方法在准确性和效率上均优于传统模型。