Recently, large language models (LLMs) have notably positioned them as capable tools for addressing complex optimization challenges. Despite this recognition, a predominant limitation of existing LLM-based optimization methods is their struggle to capture the relationships among decision variables when relying exclusively on numerical text prompts, especially in high-dimensional problems. Keeping this in mind, we first propose to enhance the optimization performance using multimodal LLM capable of processing both textual and visual prompts for deeper insights of the processed optimization problem. This integration allows for a more comprehensive understanding of optimization problems, akin to human cognitive processes. We have developed a multimodal LLM-based optimization framework that simulates human problem-solving workflows, thereby offering a more nuanced and effective analysis. The efficacy of this method is evaluated through extensive empirical studies focused on a well-known combinatorial optimization problem, i.e., capacitated vehicle routing problem. The results are compared against those obtained from the LLM-based optimization algorithms that rely solely on textual prompts, demonstrating the significant advantages of our multimodal approach.
翻译:近期,大型语言模型(LLMs)已显著展现出其作为应对复杂优化挑战的有效工具的能力。尽管如此,现有基于LLM的优化方法存在一个主要局限:当仅依赖数值文本提示时,尤其是在高维问题中,它们难以捕捉决策变量之间的关联。基于此,我们首次提出利用能够同时处理文本和视觉提示的多模态LLM来增强优化性能,从而更深入地洞察所处理的优化问题。这种集成方式能够更全面地理解优化问题,类似于人类的认知过程。我们开发了一个基于多模态LLM的优化框架,该框架模拟人类问题解决的工作流程,从而提供更细致、更有效的分析。该方法的效果通过针对一个著名的组合优化问题(即容量受限车辆路径问题)的广泛实证研究进行评估。结果与仅依赖文本提示的基于LLM的优化算法所获得的结果进行了比较,证明了我们多模态方法的显著优势。