The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently training neural network models using sequences of varying sizes. To address this challenge, we propose a novel training scheme that enables efficient distributed data-parallel training on sequences of different sizes with minimal overhead. By using this scheme we were able to reduce the padding amount by more than 100$x$ while not deleting a single frame, resulting in an overall increased performance on both training time and Recall in our experiments.
翻译:现代深度神经网络模型日益增长的复杂度以及数据集规模的不断扩大,亟需开发优化且可扩展的训练方法。本文针对使用不同长度序列高效训练神经网络模型的挑战,提出了一种新型训练方案,能够在不同尺寸序列上实现低开销的高效分布式数据并行训练。采用该方案后,我们成功将填充量减少超过100倍,且未删除任何单个数据帧,实验结果表明该方案在训练时间与召回率上均实现了整体性能提升。