Applications built on top of Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their human-level capabilities in natural language processing. However, they also pose many significant risks such as the presence of biased, private, or harmful text, and the unauthorized inclusion of copyrighted material. We introduce h2oGPT, a suite of open-source code repositories for the creation and use of LLMs based on Generative Pretrained Transformers (GPTs). The goal of this project is to create the world's best truly open-source alternative to closed-source approaches. In collaboration with and as part of the incredible and unstoppable open-source community, we open-source several fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready for commercial use under fully permissive Apache 2.0 licenses. Included in our release is 100\% private document search using natural language. Open-source language models help boost AI development and make it more accessible and trustworthy. They lower entry hurdles, allowing people and groups to tailor these models to their needs. This openness increases innovation, transparency, and fairness. An open-source strategy is needed to share AI benefits fairly, and H2O.ai will continue to democratize AI and LLMs.
翻译:基于GPT-4等大型语言模型(LLM)构建的应用,因其在自然语言处理方面达到人类水平的能力,代表了人工智能领域的一场革命。然而,这些模型也带来了诸多显著风险,例如存在偏见、隐私或有害文本,以及未经授权包含受版权保护的材料。我们推出了h2oGPT,这是一套基于生成式预训练变换器(GPT)创建和使用LLM的开源代码仓库。该项目旨在为闭源方法创建全球最佳的真正开源替代方案。在与开源社区的惊人且势不可挡的合作中,我们开源了多个从70亿到400亿参数规模的精调h2oGPT模型,这些模型采用完全宽松的Apache 2.0许可,可随时用于商业用途。我们的发布中包含使用自然语言进行100%私有文档搜索的功能。开源语言模型有助于促进人工智能发展,使其更易获取且更值得信赖。它们降低了入门门槛,使个人和团体能够根据自身需求定制这些模型。这种开放性增强了创新性、透明度和公平性。为了公平地分享人工智能带来的益处,需要采用开源策略。H2O.ai将继续推动人工智能和大型语言模型的民主化。