Production deployments in complex systems require ML architectures to be highly efficient and usable against multiple tasks. Particularly demanding are classification problems in which data arrives in a streaming fashion and each class is presented separately. Recent methods with stochastic gradient learning have been shown to struggle in such setups or have limitations like memory buffers, and being restricted to specific domains that disable its usage in real-world scenarios. For this reason, we present a fully differentiable architecture based on the Mixture of Experts model, that enables the training of high-performance classifiers when examples from each class are presented separately. We conducted exhaustive experiments that proved its applicability in various domains and ability to learn online in production environments. The proposed technique achieves SOTA results without a memory buffer and clearly outperforms the reference methods.
翻译:复杂系统中的生产部署要求机器学习架构高效且能处理多任务。尤其具有挑战性的是分类问题,其中数据以流式方式到达,每个类别单独呈现。近期基于随机梯度学习的方法在此类场景中表现不佳,或存在内存缓冲区等局限,且受限于特定领域而无法应用于实际场景。为此,我们提出一种基于专家混合模型的完全可微架构,可使分类器在各类样本独立呈现时训练出高性能模型。我们通过大量实验证明了该方法在多个领域的适用性及生产环境在线学习能力。所提技术无需内存缓冲区即达到当前最优水平,并显著优于参考方法。