The pervasive deployment of Large Language Models-LLMs in various sectors often neglects the nuanced requirements of individuals and small organizations, who benefit more from models precisely tailored to their specific business contexts rather than those with broadly superior general capabilities. This work introduces \textbf{AnyTaskTune}, a novel fine-tuning methodology coined as \textbf{Task-Fine-Tune}, specifically developed to elevate model performance on a diverse array of domain-specific tasks. This method involves a meticulous process to identify and define targeted sub-tasks within a domain, followed by the creation of specialized enhancement datasets for fine-tuning, thereby optimizing task-specific model performance. We conducted comprehensive fine-tuning experiments not only in the legal domain for tasks such as keyword extraction and sentence prediction but across over twenty different sub-tasks derived from the domains of finance, healthcare, law, psychology, consumer services, and human resources. To substantiate our approach and facilitate community engagement, we will open-source these bilingual task datasets. Our findings demonstrate that models fine-tuned using the \textbf{Task-Fine-Tune} methodology not only achieve superior performance on these specific tasks but also significantly outperform models with higher general capabilities in their respective domains. Our work is publicly available at \url{https://github.com/PandaVT/DataTager}.
翻译:大型语言模型(LLMs)在各行各业的广泛部署,往往忽视了个人及小型组织的细微需求。对于这些用户而言,精确适配其特定业务场景的模型,比那些仅具备广泛通用优势的模型更具价值。本研究提出了 \textbf{AnyTaskTune},一种新颖的微调方法,我们将其定义为 \textbf{任务微调}。该方法专为提升模型在多种领域特定任务上的性能而设计,其核心流程包括:在特定领域内细致地识别并定义目标子任务,随后创建专门的增强数据集用于微调,从而优化模型在具体任务上的表现。我们不仅针对法律领域的关键词提取、句子预测等任务进行了全面的微调实验,更将实验范围扩展至源自金融、医疗、法律、心理学、消费者服务及人力资源等领域的超过二十种不同子任务。为验证我们的方法并促进社区参与,我们将开源这些双语任务数据集。我们的研究结果表明,采用 \textbf{任务微调} 方法微调的模型,不仅在这些特定任务上取得了更优的性能,而且在各自领域内显著超越了那些通用能力更强的模型。我们的工作已公开于 \url{https://github.com/PandaVT/DataTager}。