Scalable algorithms of posterior approximation allow Bayesian nonparametrics such as Dirichlet process mixture to scale up to larger dataset at fractional cost. Recent algorithms, notably the stochastic variational inference performs local learning from minibatch. The main problem with stochastic variational inference is that it relies on closed form solution. Stochastic gradient ascent is a modern approach to machine learning and is widely deployed in the training of deep neural networks. In this work, we explore using stochastic gradient ascent as a fast algorithm for the posterior approximation of Dirichlet process mixture. However, stochastic gradient ascent alone is not optimal for learning. In order to achieve both speed and performance, we turn our focus to stepsize optimization in stochastic gradient ascent. As as intermediate approach, we first optimize stepsize using the momentum method. Finally, we introduce Fisher information to allow adaptive stepsize in our posterior approximation. In the experiments, we justify that our approach using stochastic gradient ascent do not sacrifice performance for speed when compared to closed form coordinate ascent learning on these datasets. Lastly, our approach is also compatible with deep ConvNet features as well as scalable to large class datasets such as Caltech256 and SUN397.
翻译:可扩展的后验近似算法使得狄利克雷过程混合等贝叶斯非参数方法能够以部分成本扩展至更大规模数据集。近期算法,特别是随机变分推断,通过小批量进行局部学习。随机变分推断的主要问题在于其依赖闭式解。随机梯度上升是现代机器学习的一种方法,已广泛应用于深度神经网络的训练。在本研究中,我们探索将随机梯度上升作为狄利克雷过程混合后验近似的快速算法。然而,单独使用随机梯度上升并非最优学习策略。为实现速度与性能的平衡,我们将研究重点转向随机梯度上升中的步长优化。作为中间方案,我们首先采用动量法优化步长。最终,我们引入费希尔信息以实现后验近似中的自适应步长。实验结果表明,与闭式坐标上升学习相比,我们基于随机梯度上升的方法在这些数据集上并未因追求速度而牺牲性能。此外,该方法同时兼容深度卷积网络特征,并可扩展至Caltech256和SUN397等大规模类别数据集。