This paper describes our system for SemEval-2026 Task 6, which addresses the classification of political evasion strategies in English question-answer pairs extracted from U.S. presidential interviews. We systematically compare two distinct paradigms: (1) Parameter-Efficient Fine-Tuning of Qwen3 models (4B-32B) using QLoRA, enhanced with tiered upsampling and weighted cross-entropy loss to address severe class imbalance, and (2) structured Chain-of-Thought (CoT) prompting of reasoning-capable API models, namely DeepSeek-V3.2 and Grok-4-Fast. Our evaluation demonstrates that structured CoT prompting of reasoning-enabled models substantially outperforms our baseline parameter-efficient fine-tuning implementation in absolute Macro F1. Our best system, Grok-4-Fast with extended reasoning and few-shot hierarchical CoT prompting, achieves a Macro F1 of 0.5147 on Subtask 2 (9-class evasion) and 0.7979 on Subtask 1 (3-class clarity), ranking 8th out of 33 teams on Subtask 2 and 13th out of 41 teams on Subtask 1 on the official leaderboard. Furthermore, our ablation studies reveal key insights into effective prompt design for evasion detection: presenting labels within a hierarchical taxonomy helps structure model reasoning, while few-shot exemplars provide task calibration. However, the strongest prompt variants are not statistically distinguishable in Macro F1, and explicitly enabling extended reasoning modes yields substantial performance gains by facilitating the multi-step pragmatic analysis required to detect evasive intent.
翻译:本文描述了我们针对SemEval-2026任务6所构建的系统,该任务旨在对从美国总统采访中提取的英文问答对进行政治规避策略分类。我们系统比较了两种不同的范式:(1)基于QLoRA的Qwen3模型(4B-32B)参数高效微调,通过分层上采样和加权交叉熵损失缓解严重的类别不平衡问题;(2)面向具备推理能力的API模型(即DeepSeek-V3.2和Grok-4-Fast)的结构化思维链提示。评估结果表明,在绝对宏F1值上,基于推理模型的结构化思维链提示显著优于我们的基线参数高效微调方案。我们的最佳系统(Grok-4-Fast结合扩展推理与少样本分层思维链提示)在子任务2(9类规避)上达到0.5147的宏F1,在子任务1(3类清晰度)上达到0.7979,在官方排行榜上分别位列第8(共33支队伍)和第13(共41支队伍)。此外,消融实验揭示了规避检测中有效提示设计的关键启示:在分层分类体系中呈现标签有助于构建模型推理结构,而少样本示例则提供任务校准。然而,最优提示变体在宏F1上并无统计显著差异;显式启用扩展推理模式通过促进检测规避意图所需的多步语用分析,可带来显著的性能提升。