Emergent symbolic representations are critical for enabling developmental learning agents to plan and generalize across tasks. In this work, we investigate whether large language models (LLMs) can translate human natural language instructions into the internal symbolic representations that emerge during hierarchical reinforcement learning. We apply a structured evaluation framework to measure the translation performance of commonly seen LLMs -- GPT, Claude, Deepseek and Grok -- across different internal symbolic partitions generated by a hierarchical reinforcement learning algorithm in the Ant Maze and Ant Fall environments. Our findings reveal that although LLMs demonstrate some ability to translate natural language into a symbolic representation of the environment dynamics, their performance is highly sensitive to partition granularity and task complexity. The results expose limitations in current LLMs capacity for representation alignment, highlighting the need for further research on robust alignment between language and internal agent representations.
翻译:涌现的符号表征对于发展型学习智能体实现跨任务规划与泛化至关重要。本研究探讨大型语言模型(LLMs)能否将人类自然语言指令翻译为分层强化学习过程中产生的内部符号表征。我们采用结构化评估框架,在Ant Maze和Ant Fall环境中,针对分层强化学习算法生成的不同内部符号划分,测试了常见LLM(GPT、Claude、Deepseek和Grok)的翻译性能。研究发现,尽管LLMs展现出将自然语言翻译为环境动态符号表征的一定能力,但其性能对划分粒度和任务复杂度高度敏感。结果揭示了当前LLMs在表征对齐能力上的局限性,凸显了需要进一步研究语言与智能体内部表征之间的鲁棒对齐机制。