The search for a general model that can operate seamlessly across multiple domains remains a key goal in machine learning research. The prevailing methodology in Reinforcement Learning (RL) typically limits models to a single task within a unimodal framework, a limitation that contrasts with the broader vision of a versatile, multi-domain model. In this paper, we present Jack of All Trades (JAT), a transformer-based model with a unique design optimized for handling sequential decision-making tasks and multimodal data types. The JAT model demonstrates its robust capabilities and versatility by achieving strong performance on very different RL benchmarks, along with promising results on Computer Vision (CV) and Natural Language Processing (NLP) tasks, all using a single set of weights. The JAT model marks a significant step towards more general, cross-domain AI model design, and notably, it is the first model of its kind to be fully open-sourced (see https://huggingface.co/jat-project/jat), including a pioneering general-purpose dataset.
翻译:寻找能够在多个领域无缝运行的通用模型仍是机器学习研究的关键目标。强化学习的主流方法通常将模型限制在单模态框架内的单一任务上,这与通用多领域模型的更广阔愿景相悖。本文提出"多面手"(Jack of All Trades, JAT)模型——一种基于Transformer的模型,其独特设计针对序贯决策任务与多模态数据类型进行了优化。JAT模型通过同一组权重在差异显著的强化学习基准测试中展现出强劲性能,并在计算机视觉与自然语言处理任务上取得令人瞩目的成果,验证了其鲁棒性与通用性。该模型标志着向更通用的跨领域AI模型设计迈出重要一步:作为首个完全开源的同类模型(参见https://huggingface.co/jat-project/jat),它同时包含开创性的通用数据集。