The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently training neural network models using sequences of varying sizes. To address this challenge, we propose a novel training scheme that enables efficient distributed data-parallel training on sequences of different sizes with minimal overhead. By using this scheme we were able to reduce the padding amount by more than 100$x$ while not deleting a single frame, resulting in an overall increased performance on both training time and Recall in our experiments.
翻译:现代深度神经网络模型日益复杂,数据集规模不断扩展,这要求开发优化且可扩展的训练方法。在本白皮书中,我们探讨了如何高效训练使用不同长度序列的神经网络模型这一挑战。为此,我们提出一种新型训练方案,该方案能够以极小开销对不同长度的序列进行高效的分布式数据并行训练。通过采用该方案,我们能够在不删除任何帧的情况下,将填充量减少超过100倍,从而在实验中整体提高了训练时间和召回率性能。