Applications in labor market intelligence demand specialized NLP systems for a wide range of tasks, characterized by extreme multi-label target spaces, strict latency constraints, and multiple text modalities such as skills and job titles. These constraints have led to isolated, task-specific developments in the field, with models and benchmarks focused on single prediction tasks. Exploiting the shared structure of work-related data, we propose a unifying framework, combining a wide range of tasks in a multi-task ranking benchmark, and a flexible architecture tackling text-driven work tasks with a single model. The benchmark, WorkBench, is the first unified evaluation suite spanning six work-related tasks formulated explicitly as ranking problems, curated from real-world ontologies and human-annotated resources. WorkBench enables cross-task analysis, where we find significant positive cross-task transfer. This insight leads to Unified Work Embeddings (UWE), a task-agnostic bi-encoder that exploits our training-data structure with a many-to-many InfoNCE objective, and leverages token-level embeddings with task-agnostic soft late interaction. UWE demonstrates zero-shot ranking performance on unseen target spaces in the work domain, and enables low-latency inference with two orders of magnitude fewer parameters than best-performing generalist models (Qwen3-8B), with +4.4 MAP improvement.
翻译:劳动力市场情报应用要求专门的自然语言处理系统处理广泛的任务,这些任务以极端多标签目标空间、严格的延迟约束以及技能和职位名称等多种文本模态为特征。这些限制导致该领域出现孤立的、特定任务的发展,模型和基准专注于单一预测任务。利用工作相关数据的共享结构,我们提出了一个统一框架,该框架结合了多任务排序基准中的广泛任务,以及通过单一模型处理文本驱动工作任务的灵活架构。该基准WorkBench是首个统一的评估套件,涵盖六项明确表述为排序问题的工作相关任务,这些任务从真实世界本体和人工标注资源中精心整理而成。WorkBench能够进行跨任务分析,我们发现显着的正向跨任务迁移。这一见解催生了统一工作嵌入(UWE),这是一种任务无关的双编码器,利用我们的训练数据结构,采用多对多InfoNCE目标,并利用具有任务无关软后期交互的令牌级嵌入。UWE在工作领域未见目标空间上展现出零样本排序性能,并且能够以比性能最佳通用模型(Qwen3-8B)少两个数量级的参数实现低延迟推理,平均精度均值(MAP)提升+4.4。