Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems

In this paper, we introduce a novel Artificial Intelligence (AI) system inspired by the philosophical and psychoanalytical concept of imagination as a ``Re-construction of Experiences". Our AI system is equipped with an imagination-inspired module that bridges the gap between textual inputs and other modalities, enriching the derived information based on previously learned experiences. A unique feature of our system is its ability to formulate independent perceptions of inputs. This leads to unique interpretations of a concept that may differ from human interpretations but are equally valid, a phenomenon we term as ``Interpretable Misunderstanding". We employ large-scale models, specifically a Multimodal Large Language Model (MLLM), enabling our proposed system to extract meaningful information across modalities while primarily remaining unimodal. We evaluated our system against other large language models across multiple tasks, including emotion recognition and question-answering, using a zero-shot methodology to ensure an unbiased scenario that may happen by fine-tuning. Significantly, our system outperformed the best Large Language Models (LLM) on the MELD, IEMOCAP, and CoQA datasets, achieving Weighted F1 (WF1) scores of 46.74%, 25.23%, and Overall F1 (OF1) score of 17%, respectively, compared to 22.89%, 12.28%, and 7% from the well-performing LLM. The goal is to go beyond the statistical view of language processing and tie it to human concepts such as philosophy and psychoanalysis. This work represents a significant advancement in the development of imagination-inspired AI systems, opening new possibilities for AI to generate deep and interpretable information across modalities, thereby enhancing human-AI interaction.

翻译：本文提出了一种新型人工智能（AI）系统，其灵感源于哲学与精神分析中“经验重构”的想象概念。该系统配备了一个想象驱动模块，通过弥合文本输入与其他模态之间的鸿沟，基于先前习得的经验对推导信息进行强化。该系统的独特之处在于，它能够形成对输入的独立感知，从而产生对概念的特有解读——这些解读可能与人类理解不同，但具有同等有效性，我们将这一现象称为“可解释性误解”。我们采用大规模模型（具体为多模态大语言模型MLLM），使所提系统在主要保持单模态特性的同时，能够跨模态提取有意义信息。我们采用零样本方法（避免微调可能产生的偏差场景）将该系统与其他大语言模型在情感识别、问答等多项任务中进行对比评估。值得注意的是，我们的系统在MELD、IEMOCAP和CoQA数据集上全面超越最优大语言模型，分别达到46.74%的加权F1分数、25.23%的加权F1分数和17%的总体F1分数，而基准大语言模型对应数值仅为22.89%、12.28%和7%。本研究旨在突破语言处理的统计视角，将其与哲学、精神分析等人类概念相融合。该工作标志着想象驱动AI系统发展的重大突破，为AI跨模态生成深层可解释信息、增强人机交互开辟了新途径。