Large language models have demonstrated exceptional capabilities in understanding and generation. However, in real-world scenarios, users' natural language expressions are often inherently fuzzy, ambiguous, and uncertain, leading to challenges such as vagueness, polysemy, and contextual ambiguity. This paper focuses on three challenges in LLM-based text generation tasks: instruction understanding, intention reasoning, and reliable dialog generation. Regarding human complex instruction, LLMs have deficiencies in understanding long contexts and instructions in multi-round conversations. For intention reasoning, LLMs may have inconsistent command reasoning, difficulty reasoning about commands containing incorrect information, difficulty understanding user ambiguous language commands, and a weak understanding of user intention in commands. Besides, In terms of Reliable Dialog Generation, LLMs may have unstable generated content and unethical generation. To this end, we classify and analyze the performance of LLMs in challenging scenarios and conduct a comprehensive evaluation of existing solutions. Furthermore, we introduce benchmarks and categorize them based on the aforementioned three core challenges. Finally, we explore potential directions for future research to enhance the reliability and adaptability of LLMs in real-world applications.
翻译:大语言模型在理解和生成方面展现出卓越能力。然而在实际场景中,用户的自然语言表达往往具有固有的模糊性、歧义性和不确定性,导致存在语义模糊、一词多义和语境歧义等挑战。本文聚焦基于大语言模型的文本生成任务中的三大挑战:指令理解、意图推理和可靠对话生成。针对人类复杂指令,大语言模型在理解长上下文和多轮对话指令方面存在不足。在意图推理方面,大语言模型可能存在指令推理不一致、难以推理包含错误信息的指令、难以理解用户模糊语言指令,以及对用户指令意图理解薄弱等问题。此外,在可靠对话生成方面,大语言模型可能存在生成内容不稳定和生成不道德内容的问题。为此,我们对大语言模型在挑战性场景中的表现进行分类分析,并对现有解决方案进行全面评估。进一步地,我们引入基准测试集,并基于上述三大核心挑战对其进行分类。最后,我们探索未来研究的潜在方向,以提升大语言模型在实际应用中的可靠性和适应性。