Recent advancements in large language models (LLMs) have expanded their application across various domains, including chip design, where domain-adapted chip models like ChipNeMo have emerged. However, these models often struggle with instruction alignment, a crucial capability for LLMs that involves following explicit human directives. This limitation impedes the practical application of chip LLMs, including serving as assistant chatbots for hardware design engineers. In this work, we introduce ChipAlign, a novel approach that utilizes a training-free model merging strategy, combining the strengths of a general instruction-aligned LLM with a chip-specific LLM. By considering the underlying manifold in the weight space, ChipAlign employs geodesic interpolation to effectively fuse the weights of input LLMs, producing a merged model that inherits strong instruction alignment and chip expertise from the respective instruction and chip LLMs. Our results demonstrate that ChipAlign significantly enhances instruction-following capabilities of existing chip LLMs, achieving up to a 26.6% improvement on the IFEval benchmark, while maintaining comparable expertise in the chip domain. This improvement in instruction alignment also translates to notable gains in instruction-involved QA tasks, delivering performance enhancements of 3.9% on the OpenROAD QA benchmark and 8.25% on production-level chip QA benchmarks, surpassing state-of-the-art baselines.
翻译:近年来,大语言模型(LLMs)的进展已将其应用扩展至包括芯片设计在内的多个领域,其中出现了如ChipNeMo等经过领域适配的芯片模型。然而,这些模型通常在指令对齐方面存在困难,而指令对齐是LLMs遵循明确人类指令的关键能力。这一局限阻碍了芯片LLMs的实际应用,包括作为硬件设计工程师的辅助聊天机器人。在本工作中,我们提出了ChipAlign,一种新颖的免训练模型融合策略,该方法结合了通用指令对齐LLM与芯片专用LLM的优势。通过考虑权重空间中的底层流形,ChipAlign采用测地线插值来有效融合输入LLMs的权重,从而生成一个融合模型,该模型继承了来自相应指令LLM和芯片LLM的强大指令对齐能力和芯片专业知识。我们的结果表明,ChipAlign显著增强了现有芯片LLMs的指令遵循能力,在IFEval基准测试上实现了高达26.6%的性能提升,同时在芯片领域保持了相当的专业水平。这种指令对齐能力的提升也转化为指令相关问答任务的显著增益,在OpenROAD QA基准测试上实现了3.9%的性能提升,在生产级芯片QA基准测试上实现了8.25%的提升,超越了现有最先进的基线方法。