Large Language Models (LLMs) have achieved remarkable progress, with Parameter-Efficient Fine-Tuning (PEFT) emerging as a key technique for downstream task adaptation. However, existing PEFT methods mainly operate in Euclidean space, fundamentally limiting their capacity to capture complex geometric structures inherent in language data. While alternative geometric spaces, like hyperbolic geometries for hierarchical data and spherical manifolds for circular patterns, offer theoretical advantages, forcing representations into a single manifold type ultimately limits expressiveness, even when curvature parameters are learnable. To address this, we propose Mixture of Space (MoS), a unified framework that leverages multiple geometric spaces simultaneously to learn richer, curvature-aware representations. Building on this scheme, we develop MoSLoRA, which extends Low-Rank Adaptation (LoRA) with heterogeneous geometric experts, enabling models to dynamically select or combine appropriate geometric spaces based on input context. Furthermore, to address the computational overhead of frequent manifold switching, we develop a lightweight routing mechanism. Moreover, we provide empirical insights into how curvature optimization impacts training stability and model performance. Our experiments across diverse benchmarks demonstrate that MoSLoRA consistently outperforms strong baselines, achieving up to 5.6% improvement on MATH500 and 15.9% on MAWPS.
翻译:大语言模型(LLMs)已取得显著进展,其中参数高效微调(PEFT)已成为下游任务适配的关键技术。然而,现有的PEFT方法主要在欧几里得空间中操作,这从根本上限制了其捕捉语言数据固有复杂几何结构的能力。尽管替代性几何空间(如用于层次数据的双曲几何和用于循环模式的球面流形)具有理论优势,但将表征强制约束于单一流形类型最终会限制表达能力——即使曲率参数可学习。为解决这一问题,我们提出空间混合(MoS)框架,该统一框架可同时利用多种几何空间来学习更丰富、曲率感知的表征。基于此方案,我们开发了MoSLoRA,它通过异构几何专家扩展了低秩适配(LoRA),使模型能够根据输入上下文动态选择或组合适当的几何空间。此外,为应对频繁流形切换带来的计算开销,我们设计了一种轻量级路由机制。更进一步,我们通过实证研究揭示了曲率优化如何影响训练稳定性与模型性能。我们在多样化基准测试上的实验表明,MoSLoRA始终优于强基线方法,在MATH500上最高提升5.6%,在MAWPS上提升15.9%。