There has been growing interest in automatically predicting missing type annotations in programs written in Python and JavaScript. While prior methods have achieved impressive accuracy when predicting the most common types, they often perform poorly on rare or complex types. In this paper, we present a new type inference method that treats type prediction as a code infilling task by leveraging CodeT5, a state-of-the-art seq2seq pre-trained language model for code. Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model. We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context, allowing information exchange between related code elements. Our evaluation shows that the proposed approach, TypeT5, not only achieves a higher overall accuracy (particularly on rare and complex types) but also produces more coherent results with fewer type errors -- while enabling easy user intervention.
翻译:近年来,自动预测Python和JavaScript程序中缺失的类型标注引起了广泛关注。虽然先前的方法在预测最常见类型时达到了令人印象深刻的准确性,但它们往往在罕见或复杂类型上表现不佳。本文提出了一种新的类型推断方法,该方法利用CodeT5(一种用于代码的最先进seq2seq预训练语言模型),将类型预测视为代码填充任务。我们的方法通过静态分析为模型需要预测类型签名的每个代码元素构建动态上下文。我们还提出了一种迭代解码方案,将先前的类型预测融入模型的输入上下文中,从而允许相关代码元素之间的信息交换。评估结果表明,所提出的TypeT5方法不仅实现了更高的整体准确性(特别是在罕见和复杂类型上),而且生成了更为一致的结果,减少了类型错误——同时便于用户干预。