We introduce EmoLoom-2B, a lightweight and reproducible pipeline that turns small language models under 2B parameters into fast screening candidates for joint emotion classification and Valence-Arousal-Dominance prediction. To ensure protocol-faithful and fair evaluation, we unify data loading, training, and inference under a single JSON input-output contract and remove avoidable variance by adopting KV-off decoding as the default setting. We incorporate two orthogonal semantic regularizers: a VAD-preserving constraint that aligns generated text with target VAD triples, and a lightweight external appraisal classifier that provides training-time guidance on goal attainment, controllability, certainty, and fairness without injecting long rationales. To improve polarity sensitivity, we introduce Valence Flip augmentation based on mirrored emotional pairs. During supervised fine-tuning, we apply A/B mixture sampling with entropy-aware temperature scheduling to balance coverage and convergence. Using Qwen-1.8B-Chat as the base model, EmoLoom-2B achieves strong performance on GoEmotions and EmpatheticDialogues, and demonstrates robust cross-corpus generalization on DailyDialog. The proposed recipe is budget-aware, auditable, and re-entrant, serving as a dependable screening pass before heavier training or multimodal fusion.
翻译:本文介绍EmoLoom-2B,这是一个轻量级且可复现的流程,能够将参数规模小于20亿的小型语言模型转化为适用于联合情感分类与效价-唤醒度-支配度预测任务的快速筛选候选模型。为确保评估过程的协议忠实性与公平性,我们将数据加载、训练与推理统一在单一的JSON输入-输出契约下,并通过采用KV-off解码作为默认设置来消除可避免的方差。我们引入了两个正交的语义正则化器:一个VAD保持约束,用于对齐生成文本与目标VAD三元组;以及一个轻量级的外部评估分类器,该分类器在训练阶段就目标达成度、可控性、确定性和公平性提供指导,而无需注入冗长的推理过程。为提升极性敏感性,我们引入了基于镜像情感对的效价翻转数据增强方法。在有监督微调阶段,我们采用结合熵感知温度调度的A/B混合采样策略,以平衡覆盖范围与收敛速度。以Qwen-1.8B-Chat作为基础模型,EmoLoom-2B在GoEmotions和EmpatheticDialogues数据集上取得了强劲性能,并在DailyDialog上展现出稳健的跨语料库泛化能力。所提出的方案具备预算意识、可审计且可重入,可作为在更重型训练或多模态融合之前的一个可靠筛选环节。