Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, designed to effectively handle Python's diverse type categories. TIGER leverages fine-tuned pre-trained code models to train a generative model with a span masking objective and a similarity model with a contrastive training objective. This approach allows TIGER to generate a wide range of type candidates, including complex generics in the generating stage, and accurately rank them with user-defined types in the ranking stage. Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories, notably improving accuracy in inferring user-defined and unseen types by 11.2% and 20.1% respectively in Top-5 Exact Match. Moreover, the experimental results not only demonstrate TIGER's superior performance and efficiency, but also underscore the significance of its generating and ranking stages in enhancing automated type inference.
翻译:Python的动态类型系统提供了灵活性和表达力,但也可能导致类型相关错误,这促使需要自动化类型推断来增强类型提示。尽管现有的基于学习的方法显示出有希望的推断准确性,但它们在全面处理各种类型(包括复杂泛型类型和(未见过的)用户定义类型)方面面临实际挑战。本文提出TIGER,一种两阶段生成-排序(GTR)框架,旨在有效处理Python的多样化类型类别。TIGER利用微调的预训练代码模型,通过跨度掩码目标训练生成模型,并通过对比训练目标训练相似性模型。该方法使TIGER能够在生成阶段生成广泛类型候选(包括复杂泛型),并在排序阶段准确对包含用户定义类型的候选进行排序。我们在ManyTypes4Py数据集上的评估表明,TIGER在各种类型类别上优于现有方法,特别是在Top-5精确匹配中,推断用户定义类型和未见类型的准确率分别提高了11.2%和20.1%。此外,实验结果不仅证明了TIGER的卓越性能和效率,还强调了其生成和排序阶段在增强自动化类型推断中的重要性。