In this article, we use probing to investigate phenomena that occur during fine-tuning and knowledge distillation of a BERT-based natural language understanding (NLU) model. Our ultimate purpose was to use probing to better understand practical production problems and consequently to build better NLU models. We designed experiments to see how fine-tuning changes the linguistic capabilities of BERT, what the optimal size of the fine-tuning dataset is, and what amount of information is contained in a distilled NLU based on a tiny Transformer. The results of the experiments show that the probing paradigm in its current form is not well suited to answer such questions. Structural, Edge and Conditional probes do not take into account how easy it is to decode probed information. Consequently, we conclude that quantification of information decodability is critical for many practical applications of the probing paradigm.
翻译:在本文中,我们使用探针方法研究基于BERT的自然语言理解(NLU)模型在微调和知识蒸馏过程中出现的现象。最终目标是利用探针更好地理解实际生产问题,从而构建更优质的NLU模型。我们设计了实验以观察微调如何改变BERT的语言能力、微调数据集的最佳规模是多少,以及基于微型Transformer的蒸馏NLU中包含多少信息。实验结果表明,当前形式的探针范式并不适合回答此类问题。结构探针、边缘探针和条件探针未考虑解码探针信息的难易程度。因此,我们得出结论:信息可解码性的量化对于探针范式的许多实际应用至关重要。