Implant Global and Local Hierarchy Information to Sequence based Code Representation Models - 专知论文

会员服务 ·

0

INFORMS · MoDELS · 代码 · Learning · 哈尔滨工业大学（HIT） ·

2023 年 3 月 14 日

Implant Global and Local Hierarchy Information to Sequence based Code Representation Models

翻译：植入全局与局部层次化信息的序列化代码表示模型

Kechi Zhang,Zhuo Li,Zhi Jin,Ge Li

from arxiv, Accepted by ICPC 2023

Source code representation with deep learning techniques is an important research field. There have been many studies that learn sequential or structural information for code representation. But sequence-based models and non-sequence-models both have their limitations. Researchers attempt to incorporate structural information to sequence-based models, but they only mine part of token-level hierarchical structure information. In this paper, we analyze how the complete hierarchical structure influences the tokens in code sequences and abstract this influence as a property of code tokens called hierarchical embedding. The hierarchical embedding is further divided into statement-level global hierarchy and token-level local hierarchy. Furthermore, we propose the Hierarchy Transformer (HiT), a simple but effective sequence model to incorporate the complete hierarchical embeddings of source code into a Transformer model. We demonstrate the effectiveness of hierarchical embedding on learning code structure with an experiment on variable scope detection task. Further evaluation shows that HiT outperforms SOTA baseline models and show stable training efficiency on three source code-related tasks involving classification and generation tasks across 8 different datasets.

翻译：利用深度学习技术进行源代码表示是一个重要的研究领域。已有大量研究通过序列信息或结构信息学习代码表示。但基于序列的模型与非序列模型均存在各自的局限性。研究者尝试将结构信息融入序列模型，但仅挖掘了部分token级别的层次结构信息。本文分析了完整层次结构如何影响代码序列中的token，并将这种影响抽象为代码token的属性——层次化嵌入。进一步将层次化嵌入划分为语句级全局层次与token级局部层次。在此基础上，提出层次化Transformer（HiT）——一种简单但有效的序列模型，可将完整的源代码层次化嵌入融入Transformer模型。通过变量作用域检测任务实验，验证了层次化嵌入在学习代码结构方面的有效性。进一步评估表明，HiT在涉及分类与生成任务的8个不同数据集上优于现有最优基线模型，并展现出稳定的训练效率。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

11篇ICLR2020满分文章，来看看他们都在做什么？

11篇ICLR2020满分文章，来看看他们都在做什么？

专知

18+阅读 · 2019年11月7日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Versican 3'-非翻译区(3'-UTR)作为非编码竞争内源性RNA(ceRNA)通过调控MicroRNAs的功能在乳腺癌中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

IL-1β通过NF-κB/Lipocalin2调控大肠癌上皮间质转化的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cystatin C对脑缺血后海马神经发生的影响及机制

国家自然科学基金

0+阅读 · 2014年12月31日

CD147在外周神经损伤病理性神经痛发生机制中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Faecalibacterium prausnitzii协同LFA-1在炎症性肠病发生中调控淋巴细胞分化及功能的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

MALAT-1调控EMT促进甲状腺乳头状癌转移的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

两栖动物镇痛肽odorranaopin结构与功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

模-相对Hochschild同调与上同调

国家自然科学基金

0+阅读 · 2011年12月31日

新BRCA1剪接异构体在乳腺癌细胞中的功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

Arxiv

0+阅读 · 2023年5月4日

ZipIt! Merging Models from Different Tasks without Training

Arxiv

0+阅读 · 2023年5月4日

Hierarchical Transformer for Scalable Graph Learning

Arxiv

1+阅读 · 2023年5月4日

Interpretable Sentence Representation with Variational Autoencoders and Attention

Arxiv

0+阅读 · 2023年5月4日

Transfer and Active Learning for Dissonance Detection: Addressing the Rare-Class Challenge

Arxiv

0+阅读 · 2023年5月3日

Backdoor Learning on Sequence to Sequence Models

Arxiv

0+阅读 · 2023年5月3日

Sequential Hierarchical Learning with Distribution Transformation for Image Super-Resolution

Arxiv

0+阅读 · 2023年5月3日

AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation

Arxiv

0+阅读 · 2023年5月3日

Hierarchical Graph Capsule Network

Hierarchical Graph Capsule Network

Arxiv

20+阅读 · 2020年12月16日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

VIP会员

文章信息

相关主题

哈尔滨工业大学（HIT）

最新内容

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

2+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

4+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

5+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

6+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

10+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

10+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

7+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

15+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

8+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

10+阅读 · 6月17日

相关VIP内容

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

相关资讯

11篇ICLR2020满分文章，来看看他们都在做什么？

11篇ICLR2020满分文章，来看看他们都在做什么？

专知

18+阅读 · 2019年11月7日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

Arxiv

0+阅读 · 2023年5月4日

ZipIt! Merging Models from Different Tasks without Training

Arxiv

0+阅读 · 2023年5月4日

Hierarchical Transformer for Scalable Graph Learning

Arxiv

1+阅读 · 2023年5月4日

Interpretable Sentence Representation with Variational Autoencoders and Attention

Arxiv

0+阅读 · 2023年5月4日

Transfer and Active Learning for Dissonance Detection: Addressing the Rare-Class Challenge

Arxiv

0+阅读 · 2023年5月3日

Backdoor Learning on Sequence to Sequence Models

Arxiv

0+阅读 · 2023年5月3日

Sequential Hierarchical Learning with Distribution Transformation for Image Super-Resolution

Arxiv

0+阅读 · 2023年5月3日

AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation

Arxiv

0+阅读 · 2023年5月3日

Hierarchical Graph Capsule Network

Hierarchical Graph Capsule Network

Arxiv

20+阅读 · 2020年12月16日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

相关基金

Versican 3'-非翻译区(3'-UTR)作为非编码竞争内源性RNA(ceRNA)通过调控MicroRNAs的功能在乳腺癌中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

IL-1β通过NF-κB/Lipocalin2调控大肠癌上皮间质转化的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cystatin C对脑缺血后海马神经发生的影响及机制

国家自然科学基金

0+阅读 · 2014年12月31日

CD147在外周神经损伤病理性神经痛发生机制中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Faecalibacterium prausnitzii协同LFA-1在炎症性肠病发生中调控淋巴细胞分化及功能的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

MALAT-1调控EMT促进甲状腺乳头状癌转移的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

两栖动物镇痛肽odorranaopin结构与功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

模-相对Hochschild同调与上同调

国家自然科学基金

0+阅读 · 2011年12月31日

新BRCA1剪接异构体在乳腺癌细胞中的功能研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员