The search for a general model that can operate seamlessly across multiple domains remains a key goal in machine learning research. The prevailing methodology in Reinforcement Learning (RL) typically limits models to a single task within a unimodal framework, a limitation that contrasts with the broader vision of a versatile, multi-domain model. In this paper, we present Jack of All Trades (JAT), a transformer-based model with a unique design optimized for handling sequential decision-making tasks and multimodal data types. The JAT model demonstrates its robust capabilities and versatility by achieving strong performance on very different RL benchmarks, along with promising results on Computer Vision (CV) and Natural Language Processing (NLP) tasks, all using a single set of weights. The JAT model marks a significant step towards more general, cross-domain AI model design, and notably, it is the first model of its kind to be fully open-sourced (see https://huggingface.co/jat-project/jat), including a pioneering general-purpose dataset.
翻译:寻求能够跨多个领域无缝运行的通用模型仍是机器学习研究的关键目标。当前强化学习领域的主流方法通常将模型限制在单一模态框架内的单任务场景,这种局限性与构建全领域通用模型的愿景形成鲜明对比。本文提出"通才型"(Jack of All Trades, JAT)模型——一种专为处理序列决策任务与多模态数据类型而优化的Transformer架构。该模型通过单一权重参数集,不仅在不同强化学习基准中展现出卓越性能,更在计算机视觉与自然语言处理任务中取得令人瞩目的成果。JAT模型标志着向跨领域通用人工智能设计迈出的重要一步,尤为值得关注的是,它是首个完全开源(详见https://huggingface.co/jat-project/jat)的此类模型,并包含开创性的通用数据集。