While there is rapid progress in video-LLMs with advanced reasoning capabilities, prior work shows that these models struggle on the challenging task of sports feedback generation and require expensive and difficult-to-collect finetuning feedback data for each sport. This limitation is evident from the poor generalization to sports unseen during finetuning. Furthermore, traditional text generation evaluation metrics (e.g., BLEU-4, METEOR, ROUGE-L, BERTScore), originally developed for machine translation and summarization, fail to capture the unique aspects of sports feedback quality. To address the first problem, using rock climbing as our case study, we propose using auxiliary freely-available web data from the target domain, such as competition videos and coaching manuals, in addition to existing sports feedback from a disjoint, source domain to improve sports feedback generation performance on the target domain. To improve evaluation, we propose two evaluation metrics: (1) specificity and (2) actionability. Together, our approach enables more meaningful and practical generation of sports feedback under limited annotations.
翻译:尽管视频大语言模型在高级推理能力方面取得了快速进展,但先前研究表明,这些模型在体育反馈生成这一挑战性任务上表现不佳,且需要为每项运动收集昂贵且难以获取的微调反馈数据。这一局限性在模型对微调期间未见过的体育项目泛化能力差上表现明显。此外,传统文本生成评估指标(如BLEU-4、METEOR、ROUGE-L、BERTScore)最初为机器翻译和摘要任务设计,无法有效捕捉体育反馈质量的独特维度。针对第一个问题,我们以攀岩作为案例研究,提出利用目标领域的辅助性免费网络数据(如比赛视频和教练手册),并结合来自不相交源领域的现有体育反馈数据,以提升模型在目标领域的体育反馈生成性能。为改进评估,我们提出两项评估指标:(1)特异性与(2)可操作性。综合而言,我们的方法能够在有限标注条件下实现更具意义和实践价值的体育反馈生成。