This paper presents the development and evaluation of ChatHome, a domain-specific language model (DSLM) designed for the intricate field of home renovation. Considering the proven competencies of large language models (LLMs) like GPT-4 and the escalating fascination with home renovation, this study endeavors to reconcile these aspects by generating a dedicated model that can yield high-fidelity, precise outputs relevant to the home renovation arena. ChatHome's novelty rests on its methodology, fusing domain-adaptive pretraining and instruction-tuning over an extensive dataset. This dataset includes professional articles, standard documents, and web content pertinent to home renovation. This dual-pronged strategy is designed to ensure that our model can assimilate comprehensive domain knowledge and effectively address user inquiries. Via thorough experimentation on diverse datasets, both universal and domain-specific, including the freshly introduced "EvalHome" domain dataset, we substantiate that ChatHome not only amplifies domain-specific functionalities but also preserves its versatility.
翻译:本文介绍了ChatHome的开发与评估,该模型是为复杂的家居装修领域设计的领域特定语言模型(DSLM)。鉴于GPT-4等大型语言模型(LLMs)已证实的卓越能力,以及人们对家居装修日益高涨的兴趣,本研究致力于通过生成一个专用模型来调和这些因素,该模型能够在家居装修相关领域输出高保真、精准的结果。ChatHome的创新之处在于其方法论,即在广泛数据集上融合领域自适应预训练与指令微调。该数据集涵盖家居装修相关的专业文章、标准文档及网络内容。这种双管齐下的策略旨在确保模型能够吸收全面的领域知识并有效响应用户查询。通过在包含通用与领域特定数据集(包括新引入的"EvalHome"领域数据集)的多样化数据集上进行全面实验,我们证明ChatHome不仅增强了领域特定功能,还保持了其通用性。