Large language models (LLMs) with long-context processing are still challenging because of their implementation complexity, training efficiency and data sparsity. To address this issue, a new paradigm named Online Long-context Processing (OLP) is proposed when we process a document of unlimited length, which typically occurs in the information reception and organization of diverse streaming media such as automated news reporting, live e-commerce, and viral short videos. Moreover, a dilemma was often encountered when we tried to select the most suitable LLM from a large number of LLMs amidst explosive growth aiming for outstanding performance, affordable prices, and short response delays. In view of this, we also develop Role Reinforcement Learning (Role-RL) to automatically deploy different LLMs in their respective roles within the OLP pipeline according to their actual performance. Extensive experiments are conducted on our OLP-MINI dataset and it is found that OLP with Role-RL framework achieves OLP benchmark with an average recall rate of 93.2% and the LLM cost saved by 79.4%. The code and dataset are publicly available at: https://anonymous.4open.science/r/Role-RL.
翻译:具备长上下文处理能力的大语言模型(LLMs)因其实现复杂性、训练效率和数据稀疏性而仍面临挑战。为应对这一问题,本文提出一种名为在线长上下文处理(OLP)的新范式,适用于处理长度不受限的文档场景——这类场景常见于自动化新闻报道、直播电商、爆款短视频等多样化流媒体的信息接收与组织过程中。此外,在LLMs数量爆发式增长的背景下,为同时追求卓越性能、可承受成本与低响应延迟,如何从海量模型中选择最合适的LLM常构成两难困境。鉴于此,我们进一步提出角色强化学习(Role-RL)方法,能够依据LLMs的实际表现,在OLP流程中自动将其部署至相应的最优角色。我们在自建的OLP-MINI数据集上进行了大量实验,结果表明:结合Role-RL框架的OLP方案在OLP基准测试中平均召回率达到93.2%,同时将LLM使用成本降低了79.4%。代码与数据集已公开于:https://anonymous.4open.science/r/Role-RL。