Complex QA and language models hybrid architectures, Survey

This paper provides a survey of the state of the art of hybrid language models architectures and strategies for "complex" question-answering (QA, CQA, CPS). Very large language models are good at leveraging public data on standard problems but once you want to tackle more specific complex questions or problems you may need specific architecture, knowledge, skills, tasks, methods, sensitive data, performance, human approval and versatile feedback... This survey extends findings from the robust community edited research papers BIG, BLOOM and HELM which open source, benchmark and analyze limits and challenges of large language models in terms of tasks complexity and strict evaluation on accuracy (e.g. fairness, robustness, toxicity, ...). It identifies the key elements used with Large Language Models (LLM) to solve complex questions or problems. Recent projects like ChatGPT and GALACTICA have allowed non-specialists to grasp the great potential as well as the equally strong limitations of language models in complex QA. Hybridizing these models with different components could allow to overcome these different limits and go much further. We discuss some challenges associated with complex QA, including domain adaptation, decomposition and efficient multi-step QA, long form QA, non-factoid QA, safety and multi-sensitivity data protection, multimodal search, hallucinations, QA explainability and truthfulness, time dimension. Therefore we review current solutions and promising strategies, using elements such as hybrid LLM architectures, human-in-the-loop reinforcement learning, prompting adaptation, neuro-symbolic and structured knowledge grounding, program synthesis, and others. We analyze existing solutions and provide an overview of the current research and trends in the area of complex QA.

翻译：本文综述了面向“复杂”问答（QA，CQA，CPS）的混合语言模型架构与策略的最新进展。超大规模语言模型善于利用公开数据解决标准问题，但若要应对更具体的复杂问题或任务，则可能需要特定的架构、知识、技能、任务、方法、敏感数据、性能评估、人工审核及多维度反馈等要素。本综述拓展了由研究社群严谨编辑的BIG、BLOOM及HELM等开源基准测试论文的研究成果——这些论文从任务复杂性及准确率严格评估（如公平性、鲁棒性、毒性等）角度，分析了大语言模型的局限与挑战。本文识别了运用大语言模型（LLM）解决复杂问题或任务的关键要素。近期项目如ChatGPT和GALACTICA已使非专业用户得以理解语言模型在复杂问答中的巨大潜力与其显著局限。通过将语言模型与不同组件进行混合，可能突破这些限制并实现更大进展。我们讨论了复杂问答面临的若干挑战，包括领域自适应、分解与高效多步问答、长文本问答、非事实性问答、安全性与多敏感数据保护、多模态搜索、幻觉现象、可解释性与真实性，以及时间维度等。为此，我们综述了现有解决方案与有前景的策略，涵盖混合LLM架构、人在回路强化学习、提示自适应、神经符号与结构化知识锚定、程序合成等方法。本文分析了现有解决方案，并概述了复杂问答领域的当前研究动态与发展趋势。