iJTyper: An Iterative Type Inference Framework for Java by Integrating Constraint- and Statistically-based Methods

Inferring the types of API elements in incomplete code snippets (e.g., those on Q&A forums) is a prepositive step required to work with the code snippets. Existing type inference methods can be mainly categorized as constraint-based or statistically-based. The former imposes higher requirements on code syntax and often suffers from low recall due to the syntactic limitation of code snippets. The latter relies on the statistical regularities learned from a training corpus and does not take full advantage of the type constraints in code snippets, which may lead to low precision. In this paper, we propose an iterative type inference framework for Java, called iJTyper, by integrating the strengths of both constraint- and statistically-based methods. For a code snippet, iJTyper first applies a constraint-based method and augments the code context with the inferred types of API elements. iJTyper then applies a statistically-based method to the augmented code snippet. The predicted candidate types of API elements are further used to improve the constraint-based method by reducing its pre-built knowledge base. iJTyper iteratively executes both methods and performs code context augmentation and knowledge base reduction until a termination condition is satisfied. Finally, the final inference results are obtained by combining the results of both methods. We evaluated iJTyper on two open-source datasets. Results show that 1) iJTyper achieves high average precision/recall of 97.31% and 92.52% on both datasets; 2) iJTyper significantly improves the recall of two state-of-the-art baselines, SnR and MLMTyper, by at least 7.31% and 27.44%, respectively; and 3) iJTyper improves the average precision/recall of the popular language model, ChatGPT, by 3.25% and 0.51% on both datasets.

翻译：推断不完整代码片段（如问答论坛中的代码）中API元素的类型，是与这些代码片段交互的预处理步骤。现有类型推断方法主要分为基于约束和基于统计两类。前者对代码语法要求较高，且常因代码片段的语法限制导致召回率较低；后者依赖从训练语料中学习的统计规律，未能充分利用代码片段中的类型约束，可能导致精度不足。本文提出一种融合两类方法优势的Java迭代类型推断框架iJTyper。对于代码片段，iJTyper首先应用基于约束的方法，用推断出的API元素类型扩充代码上下文；随后对扩充后的代码片段应用基于统计的方法。预测的候选类型进一步通过缩减预构建知识库来改进基于约束的方法。iJTyper迭代执行两种方法，交替进行代码上下文扩充与知识库缩减，直至满足终止条件。最终通过融合两种方法的结果获得推断结论。我们在两个开源数据集上评估了iJTyper。结果表明：（1）iJTyper在两个数据集上分别达到97.31%的高平均精确率和92.52%的高平均召回率；（2）相比SnR和MLMTyper两种基线方法，iJTyper使召回率分别至少提升7.31%和27.44%；（3）iJTyper将流行语言模型ChatGPT在两个数据集上的平均精确率与召回率分别提升3.25%和0.51%。