CodeLabeller: A Web-based Code Annotation Tool for Java Design Patterns and Summaries - 专知论文

会员服务 ·

0

设计模式 · 标注 · Java · Learning · Processing（编程语言） ·

2023 年 3 月 13 日

CodeLabeller: A Web-based Code Annotation Tool for Java Design Patterns and Summaries

翻译：CodeLabeller：一种基于Web的Java设计模式与摘要代码标注工具

Najam Nazar,Norman Chen,Chun Yong Chong

from arxiv, 15 pages, 5 Figures, 6 Tables

While constructing supervised learning models, we require labelled examples to build a corpus and train a machine learning model. However, most studies have built the labelled dataset manually, which in many occasions is a daunting task. To mitigate this problem, we have built an online tool called CodeLabeller. CodeLabeller is a web-based tool that aims to provide an efficient approach to handling the process of labelling source code files for supervised learning methods at scale by improving the data collection process throughout. CodeLabeller is tested by constructing a corpus of over a thousand source files obtained from a large collection of open source Java projects and labelling each Java source file with their respective design patterns and summaries. Twenty five experts in the field of software engineering participated in a usability evaluation of the tool using the standard User Experience Questionnaire online survey. The survey results demonstrate that the tool achieves the Good standard on hedonic and pragmatic quality standards, is easy to use and meets the needs of the annotating the corpus for supervised classifiers. Apart from assisting researchers in crowdsourcing a labelled dataset, the tool has practical applicability in software engineering education and assists in building expert ratings for software artefacts.

翻译：在构建监督学习模型时，我们需要带有标注的样本来建立语料库并训练机器学习模型。然而，大多数研究都是手动构建标注数据集，这在许多情况下是一项艰巨的任务。为解决这一问题，我们开发了一个名为CodeLabeller的在线工具。CodeLabeller是一种基于Web的工具，旨在通过全程改进数据收集过程，提供一种高效处理大规模源代码文件标注流程的方法，以支持监督学习方法。我们通过构建一个包含从大量开源Java项目中获取的千余个源文件的语料库，并对每个Java源文件标注其相应的设计模式和摘要，对CodeLabeller进行了测试。二十五位软件工程领域的专家参与了该工具的用户体验评估，使用标准用户体验问卷在线调查。调查结果表明，该工具在享乐质量和实用质量标准上达到"良好"水平，易于使用，并满足为监督分类器标注语料库的需求。除了帮助研究人员众包构建标注数据集外，该工具在软件工程教育中具有实际应用价值，并能辅助构建软件制品专家评分。

0

相关内容

设计模式

设计模式（Design Pattern）是一套被反复使用、多数人知晓的、经过分类的、代码设计经验的总结。

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

109+阅读 · 2020年5月1日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

机器学习研究会

13+阅读 · 2018年1月27日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

NASICON结构锂快离子导体晶粒与晶界共调控及机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

非小细胞肺癌中黄连素干预TF/FVIIa通路抑制转移的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

海水中羟基多溴联苯醚和溴代二噁英的光化学形成机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

纤维结构不良中破骨细胞过度激活的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

锂离子电池三维中空锡合金/高分子膜核壳纳米线阵列电极的稳定化研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

SARI基因在肺癌侵袭转移中的作用及分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

shRNA干扰mTOR信号途径抑制镍诱导的Cap43基因表达的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Unified Framework for Identity and Imagined Action Recognition from EEG patterns

Arxiv

0+阅读 · 2023年5月2日

Neural Architecture Search for Visual Anomaly Segmentation

Arxiv

0+阅读 · 2023年5月2日

Adversarial Representation Learning for Robust Privacy Preservation in Audio

Arxiv

0+阅读 · 2023年4月29日

Building the MSR Tool Kaiaulu: Design Principles and Experiences

Arxiv

0+阅读 · 2023年4月28日

A Survey of Deep Graph Clustering: Taxonomy, Challenge, and Application

Arxiv

13+阅读 · 2022年11月23日

Deep Learning for Time Series Anomaly Detection: A Survey

Arxiv

21+阅读 · 2022年11月9日

Domain Generalization in Vision: A Survey

Arxiv

17+阅读 · 2021年7月18日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

VIP会员

文章信息

相关主题

Processing（编程语言）

最新内容

《无人水面艇文献综述与结构设计》135页

《无人水面艇文献综述与结构设计》135页

专知会员服务

9+阅读 · 6月13日

《自主蜂群系统的战略架构：多域一体化、抗毁韧性及海上作战框架（2025—2035）》46页报告

《自主蜂群系统的战略架构：多域一体化、抗毁韧性及海上作战框架（2025—2035）》46页报告

专知会员服务

9+阅读 · 6月13日

ICML 2026｜MEMOPILOT：用强化学习训练会进化的智能体记忆

ICML 2026｜MEMOPILOT：用强化学习训练会进化的智能体记忆

专知会员服务

2+阅读 · 6月13日

智能体时间序列系统全景综述：架构、可靠性与研究前沿

智能体时间序列系统全景综述：架构、可靠性与研究前沿

专知会员服务

9+阅读 · 6月13日

AUTOLAB：86亿Token实测前沿模型的长程自动科研能力

AUTOLAB：86亿Token实测前沿模型的长程自动科研能力

专知会员服务

8+阅读 · 6月12日

CVPR 2026趋势报告：视觉AI正在走向世界模型与物理智能，165页ppt

CVPR 2026趋势报告：视觉AI正在走向世界模型与物理智能，165页ppt

专知会员服务

20+阅读 · 6月12日

乌克兰战场背后的新武器

乌克兰战场背后的新武器

专知会员服务

7+阅读 · 6月12日

《信任但需验证：军事决策背景下的大型语言模型品格、能力与控制》2026最新59页报告

《信任但需验证：军事决策背景下的大型语言模型品格、能力与控制》2026最新59页报告

专知会员服务

12+阅读 · 6月12日

未来战争：乌克兰2026年反攻中的作战经验教训 - 新军事战略之“后勤封锁”（中文下载）

未来战争：乌克兰2026年反攻中的作战经验教训 - 新军事战略之“后勤封锁”（中文下载）

专知会员服务

9+阅读 · 6月12日

基于博弈论的陆军人机协同（长文报告）

基于博弈论的陆军人机协同（长文报告）

专知会员服务

13+阅读 · 6月12日

《天气对反无人机系统“探测-跟踪-识别-失效”链路的影响：俄乌战场分析》

《天气对反无人机系统“探测-跟踪-识别-失效”链路的影响：俄乌战场分析》

专知会员服务

12+阅读 · 6月12日

美国陆军航空兵：以愿景引领转型

美国陆军航空兵：以愿景引领转型

专知会员服务

7+阅读 · 6月12日

CVPR 2026教程｜扩散模型原理：连续、离散与实时生成

CVPR 2026教程｜扩散模型原理：连续、离散与实时生成

专知会员服务

7+阅读 · 6月11日

重磅综述｜大模型智能体环境工程：建模、合成、评估与协同演化

重磅综述｜大模型智能体环境工程：建模、合成、评估与协同演化

专知会员服务

9+阅读 · 6月11日

面向特种部队的、以操作员为中心的人工智能决策支持系统框架

面向特种部队的、以操作员为中心的人工智能决策支持系统框架

专知会员服务

9+阅读 · 6月11日

相关VIP内容

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

109+阅读 · 2020年5月1日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《自主蜂群系统的战略架构：多域一体化、抗毁韧性及海上作战框架（2025—2035）》46页报告

智能体时间序列系统全景综述：架构、可靠性与研究前沿

《无人水面艇文献综述与结构设计》135页

ICML 2026｜MEMOPILOT：用强化学习训练会进化的智能体记忆

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

机器学习研究会

13+阅读 · 2018年1月27日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

Unified Framework for Identity and Imagined Action Recognition from EEG patterns

Arxiv

0+阅读 · 2023年5月2日

Neural Architecture Search for Visual Anomaly Segmentation

Arxiv

0+阅读 · 2023年5月2日

Adversarial Representation Learning for Robust Privacy Preservation in Audio

Arxiv

0+阅读 · 2023年4月29日

Building the MSR Tool Kaiaulu: Design Principles and Experiences

Arxiv

0+阅读 · 2023年4月28日

A Survey of Deep Graph Clustering: Taxonomy, Challenge, and Application

Arxiv

13+阅读 · 2022年11月23日

Deep Learning for Time Series Anomaly Detection: A Survey

Arxiv

21+阅读 · 2022年11月9日

Domain Generalization in Vision: A Survey

Arxiv

17+阅读 · 2021年7月18日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

NASICON结构锂快离子导体晶粒与晶界共调控及机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

非小细胞肺癌中黄连素干预TF/FVIIa通路抑制转移的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

海水中羟基多溴联苯醚和溴代二噁英的光化学形成机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

纤维结构不良中破骨细胞过度激活的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

锂离子电池三维中空锡合金/高分子膜核壳纳米线阵列电极的稳定化研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

SARI基因在肺癌侵袭转移中的作用及分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

shRNA干扰mTOR信号途径抑制镍诱导的Cap43基因表达的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员