Poker is a landmark challenge for artificial intelligence. The dominant approach relies on equilibrium solvers built on counterfactual regret minimization, requiring millions of core-hours of training. Large Language Models (LLMs) possess extensive poker knowledge but perform far below solver-based agents when asked to play directly. Traditional rule-based poker agents are interpretable and training-free, but their strategic ceiling remains far below equilibrium play. We introduce \textbf{PokerSkill}, a training-free and solver-free framework that bridges this gap by using detailed rule-based poker skills as a structured action-grounding interface for LLMs. A deterministic context engine analyzes the current state and retrieves only the relevant fragments from a layered skill library, which is entirely designed by human poker experts, constraining the LLM's choice to reasonable actions. Against GTOWizard, a state-of-the-art GTO benchmark, GPT-5.5 XHigh with PokerSkill achieves $-57 \pm 21$ mbb/hand, Claude Opus 4.6 achieves $-80 \pm 29$ mbb/hand and Claude Opus 4.7 achieves $-87\pm 64$ mbb/hand, reducing losses by 49--61\% compared to default-prompt baselines and outperforming the strong bot Slumbot. Our key finding is that rule-based skills alone do not constitute a strong strategy, and LLMs alone cannot play well, but their combination yields an agent that requires neither training nor solver access yet competes with systems built on millions of core-hours of computation. To our knowledge, this is the first demonstration of an LLM achieving competitive performance in a complex imperfect-information game without game-specific training or solver queries. Code is available at https://github.com/lbn187/PokerSkill.
翻译:扑克是人工智能领域的标志性挑战。主流方法依赖于基于反事实遗憾最小化的均衡求解器,需耗费数百万核时的训练。大语言模型(LLMs)虽具备丰富的扑克知识,但在直接对局时表现远逊于基于求解器的智能体。传统基于规则的扑克智能体具有可解释性且无需训练,但其策略上限仍远低于均衡博弈水平。我们提出**PokerSkill**,一种无需训练和求解器的框架,通过将基于详细规则扑克技能作为LLMs的结构化动作约束接口来弥合这一差距。确定性上下文引擎分析当前状态,并从完全由人类扑克专家设计的分层技能库中仅检索相关片段,限制LLM的选择为合理动作。在与顶尖GTO基准GTOWizard的对战中,搭载PokerSkill的GPT-5.5 XHigh达到$-57 \pm 21$ mbb/hand,Claude Opus 4.6达到$-80 \pm 29$ mbb/hand,Claude Opus 4.7达到$-87\pm 64$ mbb/hand,较默认提示基线减少49–61%的损失,并超越强智能体Slumbot。我们的核心发现是:单独依靠基于规则的技能无法构成强策略,单独依靠LLM也无法良好对局,但两者结合产生的智能体既无需训练也无需求解器,却能与基于数百万核时计算构建的系统相抗衡。据我们所知,这是首个在不进行游戏特定训练或求解器查询的情况下,大语言模型在复杂不完全信息博弈中达到竞争性表现的实证。代码见https://github.com/lbn187/PokerSkill。