A Machine Learning Approach Towards SKILL Code Autocompletion

As Moore's Law continues to increase the complexity of electronic systems, Electronic Design Automation (EDA) must advance to meet global demand. An important example of an EDA technology is SKILL, a scripting language used to customize and extend EDA software. Recently, code generation models using the transformer architecture have achieved impressive results in academic settings and have even been used in commercial developer tools to improve developer productivity. To the best of our knowledge, this study is the first to apply transformers to SKILL code autocompletion towards improving the productivity of hardware design engineers. In this study, a novel, data-efficient methodology for generating SKILL code is proposed and experimentally validated. More specifically, we propose a novel methodology for (i) creating a high-quality SKILL dataset with both unlabeled and labeled data, (ii) a training strategy where T5 models pre-trained on general programming language code are fine-tuned on our custom SKILL dataset using unsupervised and supervised learning, and (iii) evaluating synthesized SKILL code. We show that models trained using the proposed methodology outperform baselines in terms of human-judgment score and BLEU score. A major challenge faced was the extremely small amount of available SKILL code data that can be used to train a transformer model to generate SKILL code. Despite our validated improvements, the extremely small dataset available to us was still not enough to train a model that can reliably autocomplete SKILL code. We discuss this and other limitations as well as future work that could address these limitations.

翻译：随着摩尔定律持续提升电子系统复杂性，电子设计自动化(EDA)技术必须进步以满足全球需求。SKILL作为EDA技术的重要实例，是一种用于定制和扩展EDA软件的脚本语言。近期，基于Transformer架构的代码生成模型已在学术环境中取得显著成果，甚至被用于商业开发者工具以提升开发效率。据我们所知，本研究首次将Transformer应用于SKILL代码自动补全，旨在提高硬件设计工程师的生产力。本文提出了一种数据高效的SKILL代码生成新方法，并通过实验验证。具体而言，我们提出了一种创新方法论，涵盖：(i) 创建包含无标签和有标签数据的高质量SKILL数据集；(ii) 采用训练策略，对预训练于通用编程语言代码的T5模型，通过无监督和监督学习在我们的定制SKILL数据集上进行微调；(iii) 评估合成的SKILL代码。研究表明，采用所提方法训练的模型在人工评分和BLEU分数上均优于基线模型。面临的主要挑战是可用的SKILL代码数据量极小，不足以训练能生成SKILL代码的Transformer模型。尽管我们的改进经验证有效，但可用的极小数据集仍不足以训练出可可靠完成SKILL代码自动补全的模型。我们讨论了这一局限性及其他局限，并提出了未来可解决这些问题的研究方向。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日