A Machine Learning Approach Towards SKILL Code Autocompletion

As Moore's Law continues to increase the complexity of electronic systems, Electronic Design Automation (EDA) must advance to meet global demand. An important example of an EDA technology is SKILL, a scripting language used to customize and extend EDA software. Recently, code generation models using the transformer architecture have achieved impressive results in academic settings and have even been used in commercial developer tools to improve developer productivity. To the best of our knowledge, this study is the first to apply transformers to SKILL code autocompletion towards improving the productivity of hardware design engineers. In this study, a novel, data-efficient methodology for generating SKILL code is proposed and experimentally validated. More specifically, we propose a novel methodology for (i) creating a high-quality SKILL dataset with both unlabeled and labeled data, (ii) a training strategy where T5 models pre-trained on general programming language code are fine-tuned on our custom SKILL dataset using unsupervised and supervised learning, and (iii) evaluating synthesized SKILL code. We show that models trained using the proposed methodology outperform baselines in terms of human-judgment score and BLEU score. A major challenge faced was the extremely small amount of available SKILL code data that can be used to train a transformer model to generate SKILL code. Despite our validated improvements, the extremely small dataset available to us was still not enough to train a model that can reliably autocomplete SKILL code. We discuss this and other limitations as well as future work that could address these limitations.

翻译：随着摩尔定律持续提升电子系统的复杂度，电子设计自动化（EDA）技术必须进步以满足全球需求。SKILL是EDA技术中的一个重要实例，它是一种用于定制和扩展EDA软件的脚本语言。近年来，采用Transformer架构的代码生成模型在学术环境中取得了显著成果，甚至已被应用于商业开发者工具以提高开发效率。据我们所知，本研究首次将Transformer应用于SKILL代码自动补全，旨在提升硬件设计工程师的生产力。本研究提出并实验验证了一种新颖且高效的数据驱动型SKILL代码生成方法。具体而言，我们提出的新方法包括：(i) 创建包含无标签和有标签数据的高质量SKILL数据集，(ii) 采用训练策略，将基于通用编程语言代码预训练的T5模型通过无监督与监督学习方式在我们定制的SKILL数据集上进行微调，(iii) 评估生成的SKILL代码。结果表明，使用所提方法训练的模型在人工评分和BLEU得分上均优于基线模型。我们面临的主要挑战是可用于训练Transformer模型生成SKILL代码的SKILL代码数据量极其有限。尽管我们的改进已通过验证，但可用数据集规模过小，仍不足以训练出能可靠完成SKILL代码自动补全的模型。我们将讨论这一局限及其他限制因素，并展望未来可解决这些问题的研究方向。