This paper presents a novel fast machine learning method that leverages two techniques: Vector Embedding on Orthonormal Basis (VEOB) and Spectral Transform (ST). The VEOB converts the original data encoding into a vector embedding with coordinates projected onto orthonormal bases. The Singular Value Decomposition (SVD) technique is used to calculate the vector basis and projection coordinates, leading to an enhanced distance measurement in the embedding space and facilitating data compression by preserving the projection vectors associated with the largest singular values. On the other hand, ST transforms sequence of vector data into spectral space. By applying the Discrete Cosine Transform (DCT) and selecting the most significant components, it streamlines the handling of lengthy vector sequences. The paper provides examples of word embedding, text chunk embedding, and image embedding, implemented in Julia language with a vector database. It also investigates unsupervised learning and supervised learning using this method, along with strategies for handling large data volumes.
翻译:本文提出一种融合两种技术的快速机器学习新方法:正交基向量嵌入(VEOB)与频谱变换(ST)。VEOB方法通过将原始数据编码转换为坐标投影至正交基的向量嵌入。采用奇异值分解(SVD)技术计算向量基与投影坐标,不仅增强了嵌入空间中的距离度量性能,还能通过保留最大奇异值对应的投影向量实现数据压缩。另一方面,ST方法将向量数据序列变换至频谱空间,通过应用离散余弦变换(DCT)并选取最显著分量,有效简化长向量序列的处理流程。本文以Julia语言实现向量数据库,给出了词嵌入、文本块嵌入与图像嵌入的实例,同时探讨了该方法在无监督学习与监督学习中的应用,以及应对大规模数据的处理策略。