The emergence of large language models (LLMs) has revolutionized natural language processing tasks. However, existing instruction-tuning datasets suffer from occupational bias: the majority of data relates to only a few occupations, which hampers the instruction-tuned LLMs to generate helpful responses to professional queries from practitioners in specific fields. To mitigate this issue and promote occupation-inclusive LLMs, we create an instruction-tuning dataset named \emph{OccuQuest}, which contains 110,000+ prompt-completion pairs and 30,000+ dialogues covering over 1,000 occupations in 26 occupational categories. We systematically request ChatGPT, organizing queries hierarchically based on Occupation, Responsibility, Topic, and Question, to ensure a comprehensive coverage of occupational specialty inquiries. By comparing with three commonly used datasets (Dolly, ShareGPT, and WizardLM), we observe that OccuQuest exhibits a more balanced distribution across occupations. Furthermore, we assemble three test sets for comprehensive evaluation, an occu-test set covering 25 occupational categories, an estate set focusing on real estate, and an occu-quora set containing real-world questions from Quora. We then fine-tune LLaMA on OccuQuest to obtain OccuLLaMA, which significantly outperforms state-of-the-art LLaMA variants (Vicuna, Tulu, and WizardLM) on professional questions in GPT-4 and human evaluations. Notably, on the occu-quora set, OccuLLaMA reaches a high win rate of 86.4\% against WizardLM.
翻译:大语言模型(LLM)的出现革新了自然语言处理任务。然而,现有指令微调数据集存在职业偏见:大多数数据仅涉及少数职业,这阻碍了经过指令微调的LLM为特定领域从业者的专业查询生成有用的回答。为缓解这一问题并推动职业包容性LLM的发展,我们构建了名为《OccuQuest》的指令微调数据集,包含11万+条提示-回答对和3万+组对话,覆盖26个职业类别中的1000余种职业。我们系统性调用ChatGPT,基于职业、职责、主题和问题四个层次分层组织查询,以确保对职业专业问题的全面覆盖。通过与三个常用数据集(Dolly、ShareGPT和WizardLM)的比较,我们发现OccuQuest在职业分布上表现更均衡。此外,我们构建了三个测试集用于全面评估:覆盖25个职业类别的occu-test集、聚焦房地产领域的estate集,以及包含Quora真实问题的occu-quora集。随后,我们在OccuQuest上微调LLaMA得到OccuLLaMA,该模型在GPT-4和人工评估的专业问题上显著优于最先进的LLaMA变体(Vicuna、Tulu和WizardLM)。值得注意的是,在occu-quora测试集中,OccuLLaMA对WizardLM的胜率高达86.4%。