Activity progress prediction aims to estimate what percentage of an activity has been completed. Currently this is done with machine learning approaches, trained and evaluated on complicated and realistic video datasets. The videos in these datasets vary drastically in length and appearance. And some of the activities have unanticipated developments, making activity progression difficult to estimate. In this work, we examine the results obtained by existing progress prediction methods on these datasets. We find that current progress prediction methods seem not to extract useful visual information for the progress prediction task. Therefore, these methods fail to exceed simple frame-counting baselines. We design a precisely controlled dataset for activity progress prediction and on this synthetic dataset we show that the considered methods can make use of the visual information, when this directly relates to the progress prediction. We conclude that the progress prediction task is ill-posed on the currently used real-world datasets. Moreover, to fairly measure activity progression we advise to consider a, simple but effective, frame-counting baseline.
翻译:活动进度预测旨在估计一项活动已完成多少百分比。目前,这一任务通过机器学习方法实现,并在复杂且真实的视频数据集上进行训练和评估。这些数据集中的视频在长度和外观上差异极大,且部分活动包含意外发展,使得进度估计变得困难。本研究考察了现有进度预测方法在这些数据集上的结果,发现当前方法似乎未能提取出对进度预测任务有用的视觉信息,因此无法超越简单的帧计数基线。我们设计了一个精确可控的活动进度预测数据集,并在此合成数据集上证明,当视觉信息与进度预测直接相关时,所考察的方法能够利用这些信息。我们得出结论:在当前使用的真实数据集上,进度预测任务存在不适定性。此外,为了公平衡量活动进度,我们建议考虑一种简单但有效的帧计数基线。