ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our evaluations demonstrate that domain-adaptive pretraining of language models, can lead to superior performance in domain related downstream tasks compared to their base LLaMA2 counterparts, without degradations in generic capabilities. In particular, our largest model, ChipNeMo-70B, outperforms the highly capable GPT-4 on two of our use cases, namely engineering assistant chatbot and EDA scripts generation, while exhibiting competitive performance on bug summarization and analysis. These results underscore the potential of domain-specific customization for enhancing the effectiveness of large language models in specialized applications.
翻译:ChipNeMo旨在探索大语言模型(LLMs)在工业芯片设计中的应用。我们并未直接部署现有的商业或开源大语言模型,而是采用了以下领域自适应技术:领域自适应分词、领域自适应持续预训练、基于领域特定指令的模型对齐,以及领域自适应检索模型。我们针对芯片设计中的三个选定大语言模型应用场景评估了这些方法:工程辅助聊天机器人、EDA脚本生成,以及缺陷总结与分析。评估结果表明,与基础LLaMA2模型相比,对语言模型进行领域自适应预训练可在领域相关下游任务中取得更优性能,且不损害通用能力。特别地,我们最大的模型ChipNeMo-70B在工程辅助聊天机器人和EDA脚本生成两个用例中超越了能力强大的GPT-4,同时在缺陷总结与分析中展现出具有竞争力的性能。这些结果突显了领域特定定制化在增强大语言模型专业应用有效性方面的潜力。