Access to expert coaching is essential for developing technique in sports, yet economic barriers often place it out of reach for many enthusiasts. To bridge this gap, we introduce Poze, an innovative video processing framework that provides feedback on human motion, emulating the insights of a professional coach. Poze combines pose estimation with sequence comparison and is optimized to function effectively with minimal data. Poze surpasses state-of-the-art vision-language models in video question-answering frameworks, achieving 70% and 196% increase in accuracy over GPT4V and LLaVAv1.6 7b, respectively.
翻译:获取专业教练指导对于提升运动技巧至关重要,但经济壁垒常使许多爱好者难以企及。为弥合这一差距,我们提出了Poze——一种创新的视频处理框架,能够通过分析人体运动提供反馈,模拟专业教练的指导洞察。Poze将姿态估计与序列比对相结合,并针对数据稀缺场景进行了优化设计。在视频问答框架的测试中,Poze的表现超越了当前最先进的视觉语言模型,其准确率相较GPT4V和LLaVAv1.6 7b分别提升了70%和196%。