Most machine learning models predict a probability distribution over concrete outputs and struggle to accurately predict names over high entropy sequence distributions. Here, we explore finding abstract, high-precision patterns intrinsic to these predictions in order to make abstract predictions that usefully capture rare sequences. In this short paper, we present Epicure, a method that distils the predictions of a sequence model, such as the output of beam search, into simple patterns. Epicure maps a model's predictions into a lattice that represents increasingly more general patterns that subsume the concrete model predictions. On the tasks of predicting a descriptive name of a function given the source code of its body and detecting anomalous names given a function, we show that Epicure yields accurate naming patterns that match the ground truth more often compared to just the highest probability model prediction. For a false alarm rate of 10%, Epicure predicts patterns that match 61% more ground-truth names compared to the best model prediction, making Epicure well-suited for scenarios that require high precision.
翻译:摘要:大多数机器学习模型预测具体输出的概率分布,但在高熵序列分布上难以准确预测名称。本文探索从这些预测中提取抽象、高精度的固有模式,以生成能够有效捕获稀有序列的抽象预测。在这篇短文中,我们提出Epicure方法,该方法将序列模型(如波束搜索输出)的预测提炼为简单模式。Epicure将模型预测映射到一个格结构中,该结构表示日益通用的模式,并涵盖具体的模型预测。在根据函数体源代码预测描述性函数名称以及检测函数异常名称的任务中,我们表明,与仅使用最高概率模型预测相比,Epicure生成的准确命名模式更能与真实结果匹配。在误报率为10%的情况下,与最佳模型预测相比,Epicure预测的模式匹配的真实名称数量增加了61%,使Epicure非常适合需要高精度的场景。