Over the broad landscape of experimental design, regression has been a powerful tool to accurately predict the outcome metrics of a system or model given a set of parameters, but has been traditionally restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over $(x,y)$ evaluation data from diverse real world experiments. Using data sourced from Google Vizier, one of the largest blackbox optimization databases in the world, our extensive experiments demonstrate that through only textual representations of mathematical parameters and values, language models are capable of very precise numerical regression, and if given the opportunity to train over multiple tasks, can significantly outperform traditional regression models.
翻译:在实验设计的广阔领域中,回归分析一直是根据一组参数准确预测系统或模型结果指标的有力工具,但传统上仅限于适用于特定任务的方法。本文提出OmniPred框架,该框架将语言模型训练为通用端到端回归器,用于处理来自各种真实世界实验的$(x,y)$评估数据。利用源自Google Vizier(全球最大的黑箱优化数据库之一)的数据,我们的广泛实验表明,仅通过数学参数和值的文本表示,语言模型就能实现非常精确的数值回归,并且在有机会进行多任务训练时,其表现可显著优于传统回归模型。