Complex QA and language models hybrid architectures, Survey

This paper provides a survey of the state of the art of hybrid language models architectures and strategies for "complex" question-answering (QA, CQA, CPS). Very large language models are good at leveraging public data on standard problems but once you want to tackle more specific complex questions or problems you may need specific architecture, knowledge, skills, tasks, methods, sensitive data, performance, human approval and versatile feedback... This survey extends findings from the robust community edited research papers BIG, BLOOM and HELM which open source, benchmark and analyze limits and challenges of large language models in terms of tasks complexity and strict evaluation on accuracy (e.g. fairness, robustness, toxicity, ...). It identifies the key elements used with Large Language Models (LLM) to solve complex questions or problems. Recent projects like ChatGPT and GALACTICA have allowed non-specialists to grasp the great potential as well as the equally strong limitations of language models in complex QA. Hybridizing these models with different components could allow to overcome these different limits and go much further. We discuss some challenges associated with complex QA, including domain adaptation, decomposition and efficient multi-step QA, long form QA, non-factoid QA, safety and multi-sensitivity data protection, multimodal search, hallucinations, QA explainability and truthfulness, time dimension. Therefore we review current solutions and promising strategies, using elements such as hybrid LLM architectures, human-in-the-loop reinforcement learning, prompting adaptation, neuro-symbolic and structured knowledge grounding, program synthesis, and others. We analyze existing solutions and provide an overview of the current research and trends in the area of complex QA.

翻译：本文综述了面向"复杂"问答任务的最先进混合语言模型架构与策略。大型语言模型在利用公共数据解决标准问题上表现优异，但当需要处理更具体的复杂问题或任务时，可能需要特定的架构、知识、技能、任务、方法、敏感数据、性能、人工验证及多元化反馈。本综述扩展了经学界严格评审的BIG、BLOOM和HELM等开源研究成果，这些工作直面大型语言模型在任务复杂度及准确性严格评估（如公平性、鲁棒性、毒性等）方面的局限与挑战。我们识别出大型语言模型解决复杂问题或任务所需的关键要素。近期如ChatGPT和GALACTICA等项目使非专业用户得以窥见语言模型在复杂问答中的巨大潜力及其同样显著的局限性。通过将这些模型与不同组件进行混合，可克服这些局限并大幅拓展其能力边界。本文讨论了复杂问答面临的若干挑战，包括领域自适应、任务分解与高效多步推理、长文本问答、非事实性问答、安全性与多敏感数据保护、多模态检索、幻觉问题、可解释性与真实性验证，以及时间维度处理。为此，我们梳理了当前解决方案与前瞻性策略，涵盖混合LLM架构、人在回路强化学习、提示适配、神经符号与结构化知识融合、程序合成等技术路径。通过对现有解决方案的分析，本文全面呈现了复杂问答领域的研究现状与发展趋势。