Determining whether a program terminates is a core challenge in program analysis with direct implications for correctness, verification, and security. We investigate whether transformer architectures can recognise termination patterns directly from source code and how their strengths can be amplified through ensembles. To overcome the extreme scarcity of non-terminating examples, we design an ensemble framework of compact transformer encoders, systematically trained with a suite of imbalance-aware loss functions and class-aware sampling techniques. By combining models trained with distinct loss functions, our ensembles achieve substantially stronger performance than any single transformer, outperforming both powerful off-the-shelf LLMs and graph-based methods. Finally, we introduce an attribution pipeline that produces syntax-aware explanations for the termination estimation.
翻译:确定程序是否终止是程序分析中的核心挑战,直接关系到程序的正确性、验证及安全性。我们探究Transformer架构能否直接从源代码中识别终止模式,以及如何通过集成方法增强其能力。为克服非终止样本的极端稀缺性,我们设计了一个紧凑型Transformer编码器的集成框架,并利用一组不平衡感知损失函数和类别感知采样技术进行系统性训练。通过组合使用不同损失函数训练的模型,我们的集成方法在性能上显著超越任何单一Transformer,不仅优于强大的现成大语言模型,还超越了基于图的方法。最后,我们引入了一个归因流水线,为终止性估计生成语法感知的解释。