Dynamic languages (such as Python and JavaScript) offer flexibility and simplified type handling for programming, but this can also lead to an increase in type-related errors and additional overhead for compile-time type inference. As a result, type inference for dynamic languages has become a popular research area. Existing approaches typically achieve type inference through static analysis, machine learning, or large language models (LLMs). However, current work only focuses on the direct dependencies of variables related to type inference as the context, resulting in incomplete contextual information and thus affecting the accuracy of type inference. To address this issue, this paper proposes a method called TypePro, which leverages LLMs for type inference in dynamic languages. TypePro supplements contextual information by conducting inter-procedural code slicing. Then, TypePro proposes a set of candidate complex types based on the structural information of data types implied in the slices, thereby addressing the lack of domain knowledge of LLMs. We conducted experiments on the ManyTypes4Py and ManyTypes4TypeScript datasets, achieving Top-1 exact match (EM) rates of 88.9% and 86.6%, respectively. Notably, TypePro improves the Top-1 Exact Match by 7.1 and 10.3 percentage points over the second-best approach, showing the effectiveness and robustness of TypePro.
翻译:动态语言(如Python和JavaScript)为编程提供了灵活性和简化的类型处理,但这也会导致类型相关错误增加,并给编译时类型推理带来额外开销。因此,动态语言的类型推理已成为一个热门研究领域。现有方法通常通过静态分析、机器学习或大语言模型(LLMs)实现类型推理。然而,当前工作仅关注与类型推理相关的变量的直接依赖关系作为上下文,导致上下文信息不完整,从而影响类型推理的准确性。为解决这一问题,本文提出了一种名为TypePro的方法,利用LLMs对动态语言进行类型推理。TypePro通过执行过程间代码切片来补充上下文信息。随后,TypePro根据切片中隐含的数据类型结构信息生成一组候选复杂类型,从而弥补LLMs领域知识的不足。我们在ManyTypes4Py和ManyTypes4TypeScript数据集上进行了实验,分别达到了88.9%和86.6%的Top-1精确匹配(EM)率。值得注意的是,TypePro在Top-1精确匹配率上比第二名方法分别提高了7.1和10.3个百分点,展现了TypePro的有效性和鲁棒性。