Structured Intent as a Protocol-Like Communication Layer: Cross-Model Robustness, Framework Comparison, and the Weak-Model Compensation Effect

from arxiv, 25 pages, figures, tables, and appendix. Third paper in a cumulative research series on PPS and 5W3H structured intent representation, extending prior work to cross-model robustness, framework comparison, and user-study validation

How reliably can structured intent representations preserve user goals across different AI models, languages, and prompting frameworks? Prior work showed that PPS (Prompt Protocol Specification), a 5W3H-based structured intent framework, improves goal alignment in Chinese and generalizes to English and Japanese. This paper extends that line of inquiry in three directions: cross-model robustness across Claude, GPT-4o, and Gemini 2.5 Pro; controlled comparison with CO-STAR and RISEN; and a user study (N=50) of AI-assisted intent expansion in ecologically valid settings. Across 3,240 model outputs (3 languages x 6 conditions x 3 models x 3 domains x 20 tasks), evaluated by an independent judge (DeepSeek-V3), we find that structured prompting substantially reduces cross-language score variance relative to unstructured baselines. The strongest structured conditions reduce cross-language sigma from 0.470 to about 0.020. We also observe a weak-model compensation pattern: the lowest-baseline model (Gemini) shows a much larger D-A gain (+1.006) than the strongest model (Claude, +0.217). Under the current evaluation resolution, 5W3H, CO-STAR, and RISEN achieve similarly high goal-alignment scores, suggesting that dimensional decomposition itself is an important active ingredient. In the user study, AI-expanded 5W3H prompts reduce interaction rounds by 60 percent and increase user satisfaction from 3.16 to 4.04. These findings support the practical value of structured intent representation as a robust, protocol-like communication layer for human-AI interaction.

翻译：结构化意图表征能否在不同AI模型、语言及提示框架间稳定保留用户目标？先前研究表明，基于5W3H的结构化意图框架PPS（提示协议规范）在中文环境中可提升目标对齐度，并能泛化至英语和日语。本研究从三个方向拓展该研究脉络：跨Claude、GPT-4o与Gemini 2.5 Pro的跨模型鲁棒性分析；与CO-STAR及RISEN框架的受控对比实验；以及在生态效度场景下对AI辅助意图扩展的用户研究（N=50）。通过对独立评判器（DeepSeek-V3）评估的3,240个模型输出（3种语言×6种条件×3个模型×3个领域×20项任务）进行分析，我们发现：相较于无结构化基线，结构化提示显著降低了跨语言得分方差。最强结构化条件将跨语言西格玛值从0.470降至约0.020。研究还观察到弱模型补偿效应：基线最弱的模型（Gemini）的D-A增益（+1.006）远高于最强模型（Claude，+0.217）。在当前评估精度下，5W3H、CO-STAR与RISEN实现了相近的高目标对齐分数，表明维度分解本身即为重要活性成分。用户研究显示，AI扩展的5W3H提示使交互轮次减少60%，用户满意度从3.16提升至4.04。这些发现证实了结构化意图表征作为人机交互中鲁棒类协议通信层的实用价值。