We propose a cluster-based frame selection strategy to mitigate information leakage in video-derived frames datasets. By grouping visually similar frames before splitting into training, validation, and test sets, the method produces more representative, balanced, and reliable dataset partitions.
翻译:我们提出一种基于聚类的帧选择策略,以缓解视频衍生帧数据集中的信息泄漏问题。该方法在将数据分割为训练集、验证集和测试集之前,先对视觉相似的帧进行分组,从而生成更具代表性、更平衡且更可靠的数据集划分。