Despite remarkable progress made in natural language processing, even the state-of-the-art models often make incorrect predictions. Such predictions hamper the reliability of systems and limit their widespread adoption in real-world applications. 'Selective prediction' partly addresses the above concern by enabling models to abstain from answering when their predictions are likely to be incorrect. While selective prediction is advantageous, it leaves us with a pertinent question 'what to do after abstention'. To this end, we present an explorative study on 'Post-Abstention', a task that allows re-attempting the abstained instances with the aim of increasing 'coverage' of the system without significantly sacrificing its 'accuracy'. We first provide mathematical formulation of this task and then explore several methods to solve it. Comprehensive experiments on 11 QA datasets show that these methods lead to considerable risk improvements -- performance metric of the Post-Abstention task -- both in the in-domain and the out-of-domain settings. We also conduct a thorough analysis of these results which further leads to several interesting findings. Finally, we believe that our work will encourage and facilitate further research in this important area of addressing the reliability of NLP systems.
翻译:尽管自然语言处理领域取得了显著进展,即使是最先进的模型也常常会做出错误预测。这些预测损害了系统的可靠性,并限制了其在现实应用中的广泛采用。"选择性预测"通过允许模型在预测可能不正确时放弃回答,部分解决了上述问题。虽然选择性预测具有优势,但它也留下了一个关键问题:"回避之后该做什么"?为此,我们对"后回避"任务进行了探索性研究——该任务允许重试已回避的实例,旨在不显著牺牲系统"准确性"的前提下提高其"覆盖率"。我们首先对该任务进行了数学形式化定义,随后探索了多种解决方法。在11个问答数据集上的全面实验表明,这些方法在域内和域外场景下均能带来显著的风险改善——即后回避任务的性能指标。我们进一步对这些结果进行了深入分析,得出若干有趣发现。最后,我们相信本研究将鼓励并促进这一解决NLP系统可靠性问题的重要领域中的后续研究。