As Moore's Law continues to increase the complexity of electronic systems, Electronic Design Automation (EDA) must advance to meet global demand. An important example of an EDA technology is SKILL, a scripting language used to customize and extend EDA software. Recently, code generation models using the transformer architecture have achieved impressive results in academic settings and have even been used in commercial developer tools to improve developer productivity. To the best of our knowledge, this study is the first to apply transformers to SKILL code autocompletion towards improving the productivity of hardware design engineers. In this study, a novel, data-efficient methodology for generating SKILL code is proposed and experimentally validated. More specifically, we propose a novel methodology for (i) creating a high-quality SKILL dataset with both unlabeled and labeled data, (ii) a training strategy where T5 models pre-trained on general programming language code are fine-tuned on our custom SKILL dataset using unsupervised and supervised learning, and (iii) evaluating synthesized SKILL code. We show that models trained using the proposed methodology outperform baselines in terms of human-judgment score and BLEU score. A major challenge faced was the extremely small amount of available SKILL code data that can be used to train a transformer model to generate SKILL code. Despite our validated improvements, the extremely small dataset available to us was still not enough to train a model that can reliably autocomplete SKILL code. We discuss this and other limitations as well as future work that could address these limitations.
翻译:随着摩尔定律持续提升电子系统复杂性,电子设计自动化(EDA)技术必须进步以满足全球需求。SKILL作为EDA技术的重要实例,是一种用于定制和扩展EDA软件的脚本语言。近期,基于Transformer架构的代码生成模型已在学术环境中取得显著成果,甚至被用于商业开发者工具以提升开发效率。据我们所知,本研究首次将Transformer应用于SKILL代码自动补全,旨在提高硬件设计工程师的生产力。本文提出了一种数据高效的SKILL代码生成新方法,并通过实验验证。具体而言,我们提出了一种创新方法论,涵盖:(i) 创建包含无标签和有标签数据的高质量SKILL数据集;(ii) 采用训练策略,对预训练于通用编程语言代码的T5模型,通过无监督和监督学习在我们的定制SKILL数据集上进行微调;(iii) 评估合成的SKILL代码。研究表明,采用所提方法训练的模型在人工评分和BLEU分数上均优于基线模型。面临的主要挑战是可用的SKILL代码数据量极小,不足以训练能生成SKILL代码的Transformer模型。尽管我们的改进经验证有效,但可用的极小数据集仍不足以训练出可可靠完成SKILL代码自动补全的模型。我们讨论了这一局限性及其他局限,并提出了未来可解决这些问题的研究方向。