The increasing reliance on the internet has led to the proliferation of a diverse set of web-browsers and operating systems (OSs) capable of browsing the web. User agent strings (UASs) are a component of web browsing that are transmitted with every Hypertext Transfer Protocol (HTTP) request. They contain information about the client device and software, which is used by web servers for various purposes such as content negotiation and security. However, due to the proliferation of various browsers and devices, parsing UASs is a non-trivial task due to a lack of standardization of UAS formats. Current rules-based approaches are often brittle and can fail when encountering such non-standard formats. In this work, a novel methodology for parsing UASs using Multi-Headed Attention Based transformers is proposed. The proposed methodology exhibits strong performance in parsing a variety of UASs with differing formats. Furthermore, a framework to utilize parsed UASs to estimate the vulnerability scores for large sections of publicly visible IT networks or regions is also discussed. The methodology present here can also be easily extended or deployed for real-time parsing of logs in enterprise settings.
翻译:随着互联网依赖性的日益增强,能够浏览网页的多样化网络浏览器和操作系统(OS)大量涌现。用户代理字符串(UAS)是网页浏览的一个组成部分,随每一次超文本传输协议(HTTP)请求一起传输。它们包含客户端设备和软件的信息,网络服务器利用这些信息实现内容协商、安全等多种目的。然而,由于各种浏览器和设备的激增,加之UAS格式缺乏标准化,解析UAS成为一项艰巨任务。当前基于规则的方法往往脆弱,在遇到非标准格式时容易失效。本研究提出了一种基于多头注意力机制Transformer的UAS解析新方法。该方法在解析格式各异的多种UAS时展现出卓越性能。此外,还讨论了一种利用解析后的UAS来评估大型公开可见IT网络或区域漏洞评分的框架。本方法还可方便地扩展或部署于企业环境中的实时日志解析。