Extracting semantic information from generated text is a useful tool for applications such as automated fact checking or retrieval augmented generation. Currently, this requires either separate models during inference, which increases computational cost, or destructive fine-tuning of the language model. Instead, we propose directly embedding information extraction capabilities into pre-trained language models using probing classifiers, enabling efficient simultaneous text generation and information extraction. For this, we introduce an approach called EMBER and show that it enables named entity recognition in decoder-only language models without fine-tuning them and while incurring minimal additional computational cost at inference time. Specifically, our experiments using GPT-2 show that EMBER maintains high token generation rates during streaming text generation, with only a negligible decrease in speed of around 1% compared to a 43.64% slowdown measured for a baseline using a separate NER model. Code and data are available at https://github.com/nicpopovic/EMBER.
翻译:从生成的文本中提取语义信息是自动化事实核查或检索增强生成等应用的有用工具。当前,这类操作需要在推理阶段使用独立模型(这会增加计算成本),或对语言模型进行破坏性微调。为此,我们提出直接利用探测分类器将信息提取能力嵌入预训练语言模型,从而实现文本生成与信息提取的高效协同。我们引入了一种名为EMBER的方法,实验证明,该方法无需微调即可在仅解码器语言模型中实现命名实体识别,且推理阶段仅增加极低的额外计算开销。具体而言,我们基于GPT-2的实验表明,EMBER在流式文本生成过程中能保持较高的令牌生成速率,其速度下降幅度仅约1%,而采用独立NER模型的基线系统速度下降达43.64%。相关代码与数据已开源至https://github.com/nicpopovic/EMBER。