An open challenge in multimodal conversational AI requires augmenting large language models with information from textual and non-textual sources for multi-turn dialogue. To address this problem, this paper introduces Conversational Tables (cTBL), a three-step encoder-decoder approach to retrieve tabular information and generate dialogue responses grounded on the retrieved information. cTBL uses Transformer encoder embeddings for Dense Table Retrieval and obtains up to 5% relative improvement in Top-1 and Top-3 accuracy over sparse retrieval on the HyrbiDialogue dataset. Additionally, cTBL performs tabular knowledge retrieval using both encoder and decoder models, resulting in up to 46% relative improvement in ROUGE scores and better human evaluation for response generation on HyrbiDialogue.
翻译:多模态对话式人工智能中的一个开放挑战需要从文本和非文本来源中增强大型语言模型以进行多轮对话。为解决这一问题,本文引入了对话式表格(cTBL),这是一种三步编码器-解码器方法,用于检索表格信息并基于检索到的信息生成对话回复。cTBL使用Transformer编码器嵌入进行密集表格检索,在HyrbiDialogue数据集上,相较于稀疏检索,Top-1和Top-3准确率获得了高达5%的相对提升。此外,cTBL通过同时使用编码器和解码器模型执行表格知识检索,使得ROUGE评分获得了高达46%的相对提升,并在HyrbiDialogue上实现了更优的人工评估结果,用于回复生成。