Users' physical safety is an increasing concern as the market for intelligent systems continues to grow, where unconstrained systems may recommend users dangerous actions that can lead to serious injury. Covertly unsafe text is an area of particular interest, as such text may arise from everyday scenarios and are challenging to detect as harmful. We propose FARM, a novel framework leveraging external knowledge for trustworthy rationale generation in the context of safety. In particular, FARM foveates on missing knowledge to qualify the information required to reason in specific scenarios and retrieves this information with attribution to trustworthy sources. This knowledge is used to both classify the safety of the original text and generate human-interpretable rationales, shedding light on the risk of systems to specific user groups and helping both stakeholders manage the risks of their systems and policymakers to provide concrete safeguards for consumer safety. Our experiments show that FARM obtains state-of-the-art results on the SafeText dataset, showing absolute improvement in safety classification accuracy by 5.9%.
翻译:随着智能系统市场的持续增长,用户的物理安全日益成为关注焦点,不受约束的系统可能推荐用户采取危险行动,导致严重伤害。隐蔽不安全文本是一个特别值得关注的领域,因为此类文本可能出现在日常场景中,且难以被检测为有害。我们提出了FARM,这是一个新颖的框架,在安全背景下利用外部知识生成可信赖的推理依据。具体而言,FARM聚焦于缺失的知识,以限定特定场景中所需推理的信息,并通过归因于可信来源检索这些信息。这些知识既用于对原始文本的安全分类,也用于生成人类可解释的推理依据,揭示系统对特定用户群体的风险,帮助利益相关者管理其系统风险,并为政策制定者提供具体措施以保障消费者安全。我们的实验表明,FARM在SafeText数据集上取得了最先进的结果,安全分类准确率绝对提升了5.9%。