Drawing on 23 anonymized student pro-jects from a fourth-year Machine Transla-tion and Post-editing course in a BA-level translation programme, this paper exam-ines how structured comparison of gen-eral-purpose LLMs and online MT sys-tems can elicit evaluative judgement in AI-mediated translation. Students translat-ed short specialised English Wikipedia texts into Catalan or Spanish, generated four system outputs, evaluated them using automatic metrics and human adequa-cy/fluency assessment, selected one output for post-editing, and justified their deci-sion in written reports. Descriptive counts are reported for all 23 projects, while qualitative interpretation is based on the 22 cases accompanied by written reports. Results show that students did not treat automatic metrics as final authority: final post-editing selections often diverged from metric rankings and were justified through adequacy, fluency, terminology, naturalness, and expected post-editing ef-fort. The study therefore does not bench-mark systems under controlled conditions; it analyses how students justified system choice within an authentic classroom as-signment.
翻译:基于一门翻译学士项目四年级“机器翻译与译后编辑”课程中23个匿名学生项目,本文探讨了通用大语言模型与在线机器翻译系统的结构化比较如何在AI中介翻译中引发评价判断。学生将短篇专业英语维基百科文本翻译成加泰罗尼亚语或西班牙语,生成四种系统输出,利用自动评估指标及人工充分性/流畅度评估对其进行评价,选择其中一个输出进行译后编辑,并在书面报告中论证其选择依据。报告涵盖全部23个项目的描述性统计,而定性分析基于附带书面报告的22个案例。结果表明:学生并未将自动评估指标视为最终权威——最终译后编辑所选输出常与指标排名不一致,其论证依据包括充分性、流畅度、术语、自然度及预期译后编辑工作量。因此,本研究并非在受控条件下对系统进行基准测试,而是分析学生在真实课堂任务中如何论证系统选择。