Industrial retrofit planning depends on structured operational data rather than free text: planners must estimate whether a newly registered prototype will require a retrofit, which retrofit package it will need, and how long the work will take. We study an industrial dataset linking a prototype-registration system (284,271 vehicles) with a retrofit-management system (48,716 cleaned visits), and compare strong tabular machine learning baselines with three LLM-based strategies on row-serialized inputs: embedding features (Amazon Titan), direct prompted classification (Claude Sonnet 4), and an ML+LLM stacking approach. Across binary occurrence prediction, 15-way retrofit-type classification, per-visit duration regression, and an aggregated monthly benchmark, classical tree ensembles remain the strongest standalone models. However, the LLM results reveal a consistent pattern: embeddings remain useful on tables (binary AUC = 0.982), direct prompting collapses once semantic signal is stripped by hashing (binary AUC = 0.500; multiclass weighted F1 = 0.018), and hybrid stacking yields the best manually built multiclass model (weighted F1 = 0.626). On the monthly benchmark, lag-based machine learning outperforms time-series foundation models, though Chronos-small remains competitive in zero-shot forecasting. The results suggest that on privacy-constrained industrial tables, LLMs are more effective as complementary components than as replacements for strong tabular baselines.
翻译:工业改装规划依赖于结构化操作数据而非自由文本:规划者需评估新注册样机是否需要改装、需要哪种改装包以及工程量预估。本研究分析了连接原型注册系统(284,271辆车)与改装管理系统(48,716次清洗后维修记录)的工业数据集,对比了强监督表格机器学习基线与三种基于行序列化输入的LLM策略:嵌入特征(Amazon Titan)、直接提示分类(Claude Sonnet 4)以及ML+LLM堆叠方法。在二元发生预测、15类改装类型分类、单次维修时长回归以及月度聚合基准测试中,经典树集成模型仍保持最强独立模型性能。但LLM结果揭示了一致规律:嵌入在表格数据中仍具效用(二元AUC=0.982),直接提示在哈希处理去除语义信号后性能骤降(二元AUC=0.500;多类别加权F1=0.018),而混合堆叠方法产生最优人工构建多类别模型(加权F1=0.626)。在月度基准测试中,基于滞后的机器学习优于时间序列基础模型,但Chronos-small在零样本预测中仍具竞争力。结果表明,在隐私受限的工业表格中,LLM更适合作为强表格基线的补充组件而非替代方案。