FARM: Field-Aware Resolution Model for Intelligent Trigger-Action Automation

Trigger-Action Programming (TAP) platforms such as IFTTT and Zapier enable Web of Things (WoT) automation by composing event-driven rules across heterogeneous services. A TAP applet links a trigger to an action and must bind trigger outputs (ingredients) to action inputs (fields) to be executable. Prior work largely treats TAP as service-level prediction from natural language, which often yields non-executable applets that still require manual configuration. We study the function-level configuration problem: generating complete applets with correct ingredient-to-field bindings. We propose FARM (Field-Aware Resolution Model), a two-stage architecture for automated applet generation with full configuration. Stage 1 trains contrastive dual encoders with selective layer freezing over schema-enriched representations, retrieving candidates from 1,724 trigger functions and 1,287 action functions (2.2M possible trigger-action pairs). Stage 2 performs selection and configuration using an LLM-based multi-agent pipeline. It includes intent analysis, trigger selection, action selection via cross-schema scoring, and configuration verification. Agents coordinate through shared state and agreement-based selection. FARM achieves 81% joint accuracy on Gold (62% Noisy, 70% One-shot) at the function level, where both trigger and action functions must match the ground truth. For comparison with service-level baselines, we map functions to their parent services and evaluate at the service level. FARM reaches 81% joint accuracy and improves over TARGE by 23 percentage points. FARM also generates ingredient-to-field bindings, producing executable automation configurations.

翻译：诸如IFTTT和Zapier等触发-动作编程（TAP）平台通过跨异构服务组合事件驱动规则，实现了物联网（WoT）自动化。一个TAP小程序将触发器与动作相连接，且必须将触发器输出（成分）绑定到动作输入（字段）才能执行。先前的研究大多将TAP视为从自然语言进行的服务级预测，这通常会产生仍需手动配置的非可执行小程序。我们研究了功能级配置问题：生成具有正确成分到字段绑定的完整小程序。我们提出了FARM（字段感知解析模型），一种用于全配置自动化小程序生成的两阶段架构。第一阶段在模式增强表示上训练具有选择性层冻结的对比双编码器，从1,724个触发器功能和1,287个动作功能（220万种可能的触发器-动作组合）中检索候选。第二阶段使用基于LLM的多智能体流程执行选择与配置。该流程包括意图分析、触发器选择、通过跨模式评分的动作选择以及配置验证。智能体通过共享状态和基于共识的选择进行协调。在功能级别（要求触发器与动作功能均需与真实标注匹配），FARM在Gold数据集上实现了81%的联合准确率（Noisy数据集62%，One-shot数据集70%）。为与服务级基线进行比较，我们将功能映射至其父服务并在服务级别进行评估。FARM达到了81%的联合准确率，较TARGE提升了23个百分点。FARM还能生成成分到字段的绑定，从而产出可执行的自动化配置。