The World Wide Web's connectivity is greatly attributed to the HTTP protocol, with HTTP messages offering informative header fields that appeal to disciplines like web security and privacy, especially concerning web tracking. Despite existing research employing HTTP/S request messages to identify web trackers, HTTP/S response headers are often overlooked. This study endeavors to design effective machine learning classifiers for web tracker detection using HTTP/S response headers. Data from the Chrome, Firefox, and Brave browsers, obtained through the traffic monitoring browser extension T.EX, serves as our data set. Eleven supervised models were trained on Chrome data and tested across all browsers. The results demonstrated high accuracy, F1-score, precision, recall, and minimal log-loss error for Chrome and Firefox, but subpar performance on Brave, potentially due to its distinct data distribution and feature set. The research suggests that these classifiers are viable for detecting web trackers in Chrome and Firefox. However, real-world application testing remains pending, and the distinction between tracker types and broader label sources could be explored in future studies.
翻译:万维网的连通性在很大程度上归功于HTTP协议,其消息中包含丰富的标头字段,对网络安全与隐私等领域具有吸引力,尤其在网络跟踪方面。尽管现有研究常利用HTTP/S请求消息识别网络跟踪器,但HTTP/S响应头往往被忽视。本研究旨在利用HTTP/S响应头设计有效的机器学习分类器,用于网络跟踪器检测。通过流量监控浏览器扩展T.EX获取的Chrome、Firefox和Brave浏览器数据被用作数据集。基于Chrome数据训练了11种监督模型,并在所有浏览器上进行测试。结果表明,针对Chrome和Firefox的分类器在准确率、F1分数、精确率、召回率和最小对数损失误差方面表现优异,但对Brave浏览器的性能欠佳,可能源于其独特的数据分布与特征集。研究表明,这些分类器可用于检测Chrome和Firefox中的网络跟踪器,但实际应用测试仍有待进行,且未来研究可探索跟踪器类型的区分及更广泛的标签来源。