Training a model to predict the next step in a concurrent program is harder than it looks: two runs of the same program from the same trace prefix can produce different next events, both valid, because the scheduler is nondeterministic. A model trained against a single label is learning to guess one outcome of a random process. We turn this around and use the nondeterminism as a training signal. We run each program many times, aggregate the observed next events into an empirical distribution, and fine-tune a 7B model to match that distribution with a KL objective. On 798 held-out predictions drawn from real production Go bugs (CockroachDB, Kubernetes, gRPC, etcd), fine-tuning on fewer than a thousand traces reaches 36.2% accuracy, ahead of Gemini 3.5 Flash used zero-shot (34.8%) and the same model without fine-tuning (28.6%). Distribution training matches cross-entropy on accuracy (35.8% vs. 36.2%) while reducing Expected Calibration Error from 0.205 to 0.169. We also derive a formal goroutine-leak signature for a class of select-blocked goroutines where P(GoUnblock)=0 holds by scheduler semantics, not by learning. We release the dataset, trained adapters, and all tooling.
翻译:预测并发程序中下一步的模型训练比表面看起来更困难:同一跟踪前缀下,同一程序两次运行可能产生不同(且均合法)的下一事件,因为调度器是非确定性的。针对单一标签训练的模型实际上是在学习猜测随机过程的某个结果。我们转而将这种非确定性用作训练信号。通过多次运行同一程序,将观测到的下一事件聚合成经验分布,并使用KL散度目标对70亿参数模型进行微调,使其匹配该分布。在798个来自真实生产环境Go程序缺陷(CockroachDB、Kubernetes、gRPC、etcd等)的保留预测中,基于不足千条轨迹的微调达到了36.2%的准确率,优于零样本下的Gemini 3.5 Flash(34.8%)和未经微调的相同模型(28.6%)。分布训练在准确率上与交叉熵相当(35.8% vs. 36.2%),同时将期望校准误差从0.205降至0.169。我们还针对一类被select阻塞的goroutine推导出了形式化的goroutine泄漏特征,其中P(GoUnblock)=0成立是基于调度器语义而非学习。我们开源了数据集、训练适配器及全部工具链。