We present DANCEMATCH, an end-to-end framework for motion-based dance retrieval, the task of identifying semantically similar choreographies directly from raw video, defined as DANCE FINGERPRINTING. While existing motion analysis and retrieval methods can compare pose sequences, they rely on continuous embeddings that are difficult to index, interpret, or scale. In contrast, DANCEMATCH constructs compact, discrete motion signatures that capture the spatio-temporal structure of dance while enabling efficient large-scale retrieval. Our system integrates Skeleton Motion Quantisation (SMQ) with Spatio-Temporal Transformers (STT) to encode human poses, extracted via Apple CoMotion, into a structured motion vocabulary. We further design DANCE RETRIEVAL ENGINE (DRE), which performs sub-linear retrieval using a histogram-based index followed by re-ranking for refined matching. To facilitate reproducible research, we release DANCETYPESBENCHMARK, a pose-aligned dataset annotated with quantised motion tokens. Experiments demonstrate robust retrieval across diverse dance styles and strong generalisation to unseen choreographies, establishing a foundation for scalable motion fingerprinting and quantitative choreographic analysis.
翻译:我们提出了DANCEMATCH,一个基于动作的舞蹈检索端到端框架,该任务旨在直接从原始视频中识别语义相似编排,定义为舞曲指纹识别。现有动作分析与检索方法虽能比较姿态序列,但依赖难以索引、解释或扩展的连续嵌入。相比之下,DANCEMATCH构建了紧凑离散的运动签名,在保留舞蹈时空结构的同时支持高效大规模检索。本系统融合骨骼运动量化与时空Transformer,将通过Apple CoMotion提取的人体姿态编码为结构化运动词典。我们进一步设计了舞蹈检索引擎,采用基于直方图的索引实现亚线性检索,并通过重排序实现精细化匹配。为促进可重复研究,我们发布了DANCETYPESBENCHMARK 数据集,该数据集包含对齐姿态标注的量化运动令牌。实验表明,该方法在多样舞蹈风格中具有稳健的检索能力,并对未见编排展现出强泛化性,为可扩展的运动指纹识别与定量编排分析奠定了基础。