Modelling Concurrency Bugs Using Machine Learning

Artificial Intelligence has gained a lot of traction in the recent years, with machine learning notably starting to see more applications across a varied range of fields. One specific machine learning application that is of interest to us is that of software safety and security, especially in the context of parallel programs. The issue of being able to detect concurrency bugs automatically has intrigued programmers for a long time, as the added layer of complexity makes concurrent programs more prone to failure. The development of such automatic detection tools provides considerable benefits to programmers in terms of saving time while debugging, as well as reducing the number of unexpected bugs. We believe machine learning may help achieve this goal by providing additional advantages over current approaches, in terms of both overall tool accuracy as well as programming language flexibility. However, due to the presence of numerous challenges specific to the machine learning approach (correctly labelling a sufficiently large dataset, finding the best model types/architectures and so forth), we have to approach each issue of developing such a tool separately. Therefore, the focus of this project is on comparing both common and recent machine learning approaches. We abstract away the complexity of procuring a labelled dataset of concurrent programs under the form of a synthetic dataset that we define and generate with the scope of simulating real-life (concurrent) programs. We formulate hypotheses about fundamental limits of various machine learning model types which we then validate by running extensive tests on our synthetic dataset. We hope that our findings provide more insight in the advantages and disadvantages of various model types when modelling programs using machine learning, as well as any other related field (e.g. NLP).

翻译：近年来人工智能取得了长足进展，机器学习尤其开始在不同领域中获得更广泛的应用。我们关注的特定机器学习应用是软件安全性与可靠性，特别是在并行程序的背景下。由于并发程序在额外复杂性层面更容易出现故障，因此自动检测并发程序缺陷的能力问题长期以来一直引发程序员的研究兴趣。这种自动检测工具的开发可为程序员节省调试时间并减少意外缺陷数量，从而带来显著益处。我们相信机器学习能够在整体工具精度和编程语言灵活性方面提供超越现有方法的额外优势，从而有助于实现这一目标。然而，由于机器学习方法面临诸多特定挑战（如正确标注足够大规模的数据集、寻找最佳模型类型/架构等），我们需逐一解决开发此类工具时涉及的每个问题。因此，本项目的重点是比较常见及最新的机器学习方法。我们将获取已标注并发程序数据集的复杂性抽象为合成数据集形式——该数据集由我们定义并生成，旨在模拟真实（并发）程序。我们针对各类机器学习模型的基本局限性提出假设，并通过在合成数据集上开展大量测试进行验证。我们期望研究结果能为使用机器学习对程序建模（及其他相关领域如自然语言处理）提供关于不同模型类型利弊的深刻洞见。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日