Type inference methods based on deep learning are becoming increasingly popular as they aim to compensate for the drawbacks of static and dynamic analysis approaches, such as high uncertainty. However, their practical application is still debatable due to several intrinsic issues such as code from different software domains will involve data types that are unknown to the type inference system. In order to overcome these problems and gain high-confidence predictions, we thus present TIPICAL, a method that combines deep similarity learning with novelty detection. We show that our method can better predict data types in high confidence by successfully filtering out unknown and inaccurate predicted data types and achieving higher F1 scores to the state-of-the-art type inference method Type4Py. Additionally, we investigate how different software domains and data type frequencies may affect the results of our method.
翻译:基于深度学习的类型推断方法日益流行,其目标在于弥补静态和动态分析方法(例如高不确定性)的缺陷。然而,由于若干固有问题(例如来自不同软件领域的代码会涉及类型推断系统未知的数据类型),这些方法的实际应用仍存在争议。为解决上述问题并获取高置信度的预测结果,我们提出TIPICAL方法,该方法将深度相似性学习与新颖性检测相结合。实验证明,我们的方法能够通过成功过滤未知及不准确的预测数据类型,在F1分数上超越现有最优类型推断方法Type4Py,从而以高置信度更好地预测数据类型。此外,我们还研究了不同软件领域及数据类型出现频率对方法结果的影响。