Large-language models are notoriously famous for their impressive performance across a wide range of tasks. One surprising example of such impressive performance is a recently identified capacity of LLMs to understand the governing principles of dynamical systems satisfying the Markovian property. In this paper, we seek to explore this direction further by studying the dynamics of stochastic gradient descent in convex and non-convex optimization. By leveraging the theoretical link between the SGD and Markov chains, we show a remarkable zero-shot performance of LLMs in predicting the local minima to which SGD converges for previously unseen starting points. On a more general level, we inquire about the possibility of using LLMs to perform zero-shot randomized trials for larger deep learning models used in practice.
翻译:大型语言模型因其在广泛任务中展现的卓越性能而备受瞩目。其中一个令人惊讶的表现是近期发现LLMs具备理解满足马尔可夫性质的动力系统控制原理的能力。本文旨在通过研究凸优化与非凸优化中随机梯度下降的动力学特性,进一步探索这一方向。通过利用SGD与马尔科夫链之间的理论关联,我们展示了LLMs在预测SGD对于未见初始点所收敛的局部极小值方面具有显著的零样本性能。在更广泛的层面上,我们探讨了利用LLMs对实际应用中的大型深度学习模型进行零样本随机试验的可能性。