SkillChain-Gym: A Benchmark for Reskilling-Aware Production-Inventory Control under Disruptions

Production planning increasingly has to treat workforce capability as a decision variable: certifications lapse when skills are not maintained, new products require skills the current workforce does not hold, and reskilling competes for the same worker hours needed for production. Existing operations benchmarks usually treat labor as exogenous, while workforce-planning models with skills and learning are rarely released as reusable testbeds. We introduce SkillChain-Gym, a benchmark specification for reskilling-aware production-inventory control: a single-site environment with stylized worker skill-state dynamics, hard threshold certification, forgetting, and capacity-consuming training actions constrained by the same per-worker time budget as production. The benchmark includes seed-controlled disruption scenarios, three feasibility modes with projection diagnostics, deterministic replay, and metrics covering operations, resilience, capability growth, and training-access distribution. We evaluate production-only, reactive adaptive, water-filling adaptive, and static-insurance policies with budget variants over 60-shift horizons with paired statistical tests. The results are regime-dependent rather than a ranking. Training-capable policies dominate the production-only baseline, and maintenance training is necessary under forgetting even without disruptions. Among training-capable classes, adaptive training helps when bottlenecks are visible in the forecast, while a lean static cross-training plan, a deliberately favorable comparator whose structure encodes relevant skill contingencies, acts as strong insurance under surprise shocks and absenteeism. Capacity slack and the forgetting rate govern the boundary between these regimes. No policy class dominates across regimes, motivating forecast-driven controllers that decide when to buy skill insurance and when to react.

翻译：生产规划日益需将劳动力能力视为决策变量：未维持的技能会导致认证失效，新产品需要当前劳动力不具备的技能，而再技能化与生产争夺同一工人工时。现有运筹学基准通常将劳动力视为外生变量，而包含技能与学习的劳动力规划模型很少作为可复用测试平台公开发布。我们提出SkillChain-Gym——一项面向再技能化感知的生产-库存控制的基准规范：该单站点环境包含风格化的工人技能状态动态、硬阈值认证、遗忘机制以及受生产同一工人工时预算约束的耗能培训动作。该基准包括种子控制的中断场景、三种可行性模式（含投影诊断）、确定性重放，以及覆盖运营、韧性、能力增长和培训访问分布的多维度指标。我们基于60班次周期，对仅生产策略、反应式自适应策略、注水自适应策略和静态保险策略（含预算变体）进行配对统计检验。结果表明绩效具有制度依赖性而非简单排序。具备培训能力的策略优于仅生产基线，且即使无中断事件，遗忘机制下仍需维持性培训。在具备培训能力的策略类别中，当瓶颈在预测中可观测时自适应培训更优，而精炼的静态交叉培训计划（一种刻意有利的比较器，其结构编码相关技能应急方案）在突发冲击和缺勤情况下充当强力保险。产能冗余与遗忘率决定这些制度间的边界边界。无策略类别跨制度主导，这激励了决定何时购买技能保险与何时反应的预测驱动型控制器。