Generative Type Inference for Python

Python is a popular dynamic programming language, evidenced by its ranking as the second most commonly used language on GitHub. However, its dynamic type system can lead to potential type errors, leading researchers to explore automatic type inference approaches for Python programs. The rule-based type inference approaches can ensure the accuracy of predicted variable types, but they suffer from low coverage problems. Supervised type inference approaches, while feature-agnostic, require large, high-quality annotated datasets and are limited to pre-defined types. As zero-shot approaches, the cloze-style approaches reformulate the type inference problem into a fill-in-the-blank problem. However, their performance is limited. This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis. TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs), enabling language models to learn from how static analysis infers types. By combining COT prompts with code slices and type hints, TypeGen constructs example prompts from human annotations. TypeGen only requires very few annotated examples to teach language models to generate similar COT prompts via in-context learning. Moreover, TypeGen enhances the interpretability of results through the use of the input-explanation-output strategy. Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match by using only five examples. Furthermore, TypeGen achieves substantial improvements of 27% to 84% compared to the zero-shot performance of large language models with parameter sizes ranging from 1.3B to 175B in terms of top-1 Exact Match.

翻译：Python是一种流行的动态编程语言，其在GitHub上排名第二的使用率便是有力证明。然而，其动态类型系统可能导致潜在的类型错误，促使研究者探索针对Python程序的自动类型推断方法。基于规则的类型推断方法能确保预测变量类型的准确性，但存在覆盖率低的问题。有监督类型推断方法虽不依赖特定特征，但需要大规模高质量标注数据集，且局限于预定义类型。作为零样本方法，完形填空式方法将类型推断问题重构为填空题，但性能有限。本文提出TypeGen——一种结合静态分析领域知识的少样本生成式类型推断方法。TypeGen通过将静态分析的类型推断步骤基于类型依赖图（TDG）转化为提示，创建思维链（COT）提示，使语言模型能够学习静态分析推断类型的方式。通过将COT提示与代码片段及类型提示相结合，TypeGen从人工标注中构建示例提示。TypeGen仅需极少量标注示例，通过上下文学习即可教会语言模型生成类似的COT提示。此外，TypeGen采用“输入-解释-输出”策略提升了结果的可解释性。实验表明，仅使用五个示例，TypeGen在参数类型预测和返回值类型预测的Top-1精确匹配率上，分别以10.0%和22.5%的优势超越最佳基线模型Type4Py。同时，在参数规模从1.3B到175B的大语言模型中，TypeGen的Top-1精确匹配率相比其零样本性能提升了27%至84%。