Rethinking Provenance Completeness with a Learning-Based Linux Scheduler

Provenance plays a critical role in maintaining traceability of a system's actions for root cause analysis of security threats and impacts. Provenance collection is often incorporated into the reference monitor of systems to ensure that an audit trail exists of all events, that events are completely captured, and that logging of such events cannot be bypassed. However, recent research has questioned whether existing state-of-the-art provenance collection systems fail to ensure the security guarantees of a true reference monitor due to the 'super producer threat' in which provenance generation can overload a system to force the system to drop security-relevant events and allow an attacker to hide their actions. One approach towards solving this threat is to enforce resource isolation, but that does not fully solve the problems resulting from hardware dependencies and performance limitations. In this paper, we show how an operating system's kernel scheduler can mitigate this threat, and we introduce Aegis, a learned scheduler for Linux specifically designed for provenance. Unlike conventional schedulers that ignore provenance completeness requirements, Aegis leverages reinforcement learning to learn provenance task behavior and to dynamically optimize resource allocation. We evaluate Aegis's efficacy and show that Aegis significantly improves both the completeness and efficiency of provenance collection systems compared to traditional scheduling, while maintaining reasonable overheads and even improving overall runtime in certain cases compared to the default Linux scheduler.

翻译：溯源在维护系统行为可追溯性方面发挥着关键作用，用于安全威胁与影响分析的根因定位。溯源收集通常被集成至系统的引用监控器中，以确保所有事件均存在审计追踪记录、事件被完整捕获，且此类事件的日志记录不可被绕过。然而，近期研究对现有最先进的溯源收集系统提出了质疑：由于存在"超级生产者威胁"——即溯源生成可能使系统过载，迫使系统丢弃安全相关事件，从而使攻击者得以隐藏其行为——这些系统可能无法确保真正引用监控器所应提供的安全保障。解决此威胁的一种途径是强制实施资源隔离，但这并未完全解决由硬件依赖性与性能限制所引发的问题。本文展示了操作系统内核调度器如何缓解此威胁，并介绍了Aegis——一种专为溯源设计的基于学习机制的Linux调度器。与忽略溯源完整性要求的传统调度器不同，Aegis利用强化学习来理解溯源任务的行为特征，并动态优化资源分配。我们评估了Aegis的有效性，结果表明：与传统调度方式相比，Aegis显著提升了溯源收集系统的完整性与效率，同时保持了合理的开销，甚至在特定场景下相比默认Linux调度器实现了整体运行时间的优化。