Deep neural retrieval models have amply demonstrated their power but estimating the reliability of their predictions remains challenging. Most dialog response retrieval models output a single score for a response on how relevant it is to a given question. However, the bad calibration of deep neural network results in various uncertainty for the single score such that the unreliable predictions always misinform user decisions. To investigate these issues, we present an efficient calibration and uncertainty estimation framework PG-DRR for dialog response retrieval models which adds a Gaussian Process layer to a deterministic deep neural network and recovers conjugacy for tractable posterior inference by P\'{o}lya-Gamma augmentation. Finally, PG-DRR achieves the lowest empirical calibration error (ECE) in the in-domain datasets and the distributional shift task while keeping $R_{10}@1$ and MAP performance.
翻译:深度神经检索模型已充分展示了其强大能力,但对其预测可靠性的评估仍然具有挑战性。大多数对话响应检索模型会为响应输出一个单一分数,以衡量其对给定问题的相关程度。然而,深度神经网络的校准不良会导致该单一分数存在各种不确定性,从而使得不可靠的预测常常误导用户的决策。为了研究这些问题,我们提出了一种高效的校准与不确定性估计框架PG-DRR,用于对话响应检索模型。该框架在确定性深度神经网络中引入高斯过程层,并通过Pólya-Gamma增广恢复共轭性,从而实现可处理的后验推断。最终,PG-DRR在域内数据集和分布偏移任务中实现了最低的经验校准误差(ECE),同时保持了$R_{10}@1$和MAP性能。