This paper reports the first brain-inspired large language model (BriLLM). This is a non-Transformer, non-GPT, non-traditional machine learning input-output controlled generative language model. The model is based on the Signal Fully-connected flowing (SiFu) definition on the directed graph in terms of the neural network, and has the interpretability of all nodes on the graph of the whole model, instead of the traditional machine learning model that only has limited interpretability at the input and output ends. In the language model scenario, the token is defined as a node in the graph. A randomly shaped or user-defined signal flow flows between nodes on the principle of "least resistance" along paths. The next token or node to be predicted or generated is the target of the signal flow. As a language model, BriLLM theoretically supports infinitely long $n$-gram models when the model size is independent of the input and predicted length of the model. The model's working signal flow provides the possibility of recall activation and innate multi-modal support similar to the cognitive patterns of the human brain. At present, we released the first BriLLM version in Chinese, with 4000 tokens, 32-dimensional node width, 16-token long sequence prediction ability, and language model prediction performance comparable to GPT-1. More computing power will help us explore the infinite possibilities depicted above.
翻译:本文报告了首个类脑大语言模型(BriLLM)。这是一种非Transformer、非GPT、非传统机器学习输入输出控制的生成式语言模型。该模型基于有向图上的信号全连接流(SiFu)定义构建神经网络,使得整个模型图中所有节点均具有可解释性,而非如传统机器学习模型仅具备输入与输出端的有限可解释性。在语言模型场景中,词元被定义为图中的节点。随机形态或用户定义的信号流遵循“最小阻力”原则沿路径在节点间流动。待预测或生成的下一个词元或节点即为信号流的目标。作为语言模型,当模型规模独立于输入及预测长度时,BriLLM在理论上支持无限长的$n$-元语法模型。该模型的工作信号流提供了类似人脑认知模式的回忆激活与先天多模态支持的可能性。目前,我们发布了首个中文版BriLLM,具备4000词元容量、32维节点宽度、16词元长序列预测能力,其语言模型预测性能可与GPT-1相媲美。更多计算资源将助力我们探索上述描绘的无限可能。