Combinatorial optimization problems arise in logistics, scheduling, and resource allocation, yet existing approaches face a fundamental trade-off among generality, performance, and usability. We present cuGenOpt, a GPU-accelerated general-purpose metaheuristic framework that addresses all three dimensions simultaneously. At the engine level, cuGenOpt adopts a "one block evolves one solution" CUDA architecture with a unified encoding abstraction (permutation, binary, integer), a two-level adaptive operator selection mechanism, and hardware-aware resource management. At the extensibility level, a user-defined operator registration interface allows domain experts to inject problem-specific CUDA search operators. At the usability level, a JIT compilation pipeline exposes the framework as a pure-Python API, and an LLM-based modeling assistant converts natural-language problem descriptions into executable solver code. Experiments across five thematic suites on three GPU architectures (T4, V100, A800) show that cuGenOpt outperforms general MIP solvers by orders of magnitude, achieves competitive quality against specialized solvers on instances up to n=150, and attains 4.73% gap on TSP-442 within 30s. Twelve problem types spanning five encoding variants are solved to optimality. Framework-level optimizations cumulatively reduce pcb442 gap from 36% to 4.73% and boost VRPTW throughput by 75-81%. Code: https://github.com/L-yang-yang/cugenopt
翻译:组合优化问题广泛存在于物流、调度和资源分配等领域,但现有方法在通用性、性能和可用性三者之间面临根本性权衡。我们提出cuGenOpt,一个同时兼顾这三个维度的GPU加速通用元启发式框架。在引擎层面,cuGenOpt采用"一个线程块进化一个解"的CUDA架构,并整合了统一编码抽象(排列、二进制、整数)、两级自适应算子选择机制以及硬件感知的资源管理策略。在可扩展性层面,用户自定义算子注册接口允许领域专家注入面向特定问题的CUDA搜索算子。在可用性层面,即时编译流水线将该框架以纯Python API形式呈现,并配备基于大语言模型的建模助手,可将自然语言问题描述转化为可执行的求解器代码。在三种GPU架构(T4、V100、A800)上针对五个主题套件的实验表明:cuGenOpt的求解性能比通用MIP求解器高出数个数量级;在规模达n=150的实例上与专用求解器相比具有竞争力;可在30秒内对TSP-442实例达到4.73%的优化差距。涉及五种编码变体的十二种问题类型均能求得最优解。框架级优化累积将pcb442的优化差距从36%降至4.73%,并将VRPTW吞吐量提升75-81%。代码地址:https://github.com/L-yang-yang/cugenopt