Recent work on mini-batch consistency (MBC) for set functions has brought attention to the need for sequentially processing and aggregating chunks of a partitioned set while guaranteeing the same output for all partitions. However, existing constraints on MBC architectures lead to models with limited expressive power. Additionally, prior work has not addressed how to deal with large sets during training when the full set gradient is required. To address these issues, we propose a Universally MBC (UMBC) class of set functions which can be used in conjunction with arbitrary non-MBC components while still satisfying MBC, enabling a wider range of function classes to be used in MBC settings. Furthermore, we propose an efficient MBC training algorithm which gives an unbiased approximation of the full set gradient and has a constant memory overhead for any set size for both train- and test-time. We conduct extensive experiments including image completion, text classification, unsupervised clustering, and cancer detection on high-resolution images to verify the efficiency and efficacy of our scalable set encoding framework.
翻译:近期关于集合函数小批量一致性(mini-batch consistency, MBC)的研究关注到,在处理和聚合分块集合的连续序列时,需要确保所有划分方式下输出相同。然而,现有MBC架构的约束导致模型表达能力有限。此外,当训练需要全集梯度时,先前工作未解决如何处理大规模集合的问题。为解决这些问题,我们提出通用MBC(Universally MBC, UMBC)类集合函数,其可与非MBC组件任意结合,同时仍满足MBC性质,从而允许更广泛的函数类在MBC场景中使用。此外,我们提出一种高效的MBC训练算法,该算法可无偏逼近全集梯度,且在训练和测试阶段对任意规模的集合保持恒定的内存开销。我们通过图像补全、文本分类、无监督聚类及高分辨率图像癌症检测等大量实验,验证了所提出的可扩展集合编码框架的效率与有效性。