Go-Oracle: Automated Test Oracle for Go Concurrency Bugs

The Go programming language has gained significant traction for developing software, especially in various infrastructure systems. Nonetheless, concurrency bugs have become a prevalent issue within Go, presenting a unique challenge due to the language's dual concurrency mechanisms-communicating sequential processes and shared memory. Detecting concurrency bugs and accurately classifying program executions as pass or fail presents an immense challenge, even for domain experts. We conducted a survey with expert developers at Bytedance that confirmed this challenge. Our work seeks to address the test oracle problem for Go programs, to automatically classify test executions as pass or fail. This problem has not been investigated in the literature for Go programs owing to its distinctive programming model. Our approach involves collecting both passing and failing execution traces from various subject Go programs. We capture a comprehensive array of execution events using the native Go execution tracer. Subsequently, we preprocess and encode these traces before training a transformer-based neural network to effectively classify the traces as either passing or failing. The evaluation of our approach encompasses 8 subject programs sourced from the GoBench repository. These subject programs are routinely used as benchmarks in an industry setting. Encouragingly, our test oracle, Go-Oracle, demonstrates high accuracies even when operating with a limited dataset, showcasing the efficacy and potential of our methodology. Developers at Bytedance strongly agreed that they would use the Go-Oracle tool over the current practice of manual inspections to classify tests for Go programs as pass or fail.

翻译：Go编程语言在软件开发中获得了广泛应用，尤其在各类基础设施系统中。然而，由于该语言兼具通信顺序进程与共享内存两种并发机制，并发缺陷已成为Go程序中普遍存在的问题。即使对领域专家而言，检测并发缺陷并准确将程序执行分类为通过或失败仍面临巨大挑战。我们通过对字节跳动专家开发者的调研证实了这一挑战。本研究旨在解决Go程序的测试预言机问题，实现测试执行的自动分类。由于Go独特的编程模型，该问题在现有文献中尚未得到系统研究。我们的方法包括从多个目标Go程序中收集通过和失败的执行轨迹，利用Go原生执行跟踪器捕获全面的执行事件序列，随后对这些轨迹进行预处理和编码，并训练基于Transformer的神经网络以有效将轨迹分类为通过或失败。我们在来自GoBench仓库的8个目标程序上评估了该方法，这些程序在工业场景中常被用作基准测试。令人鼓舞的是，即使使用有限数据集，我们的测试预言机Go-Oracle仍展现出高准确率，证明了该方法的有效性和潜力。字节跳动的开发者高度认可Go-Oracle工具，并表示将使用该工具替代当前人工检查的方式对Go程序测试结果进行分类。