Companies that have an online presence-in particular, companies that are exclusively digital-often subscribe to this business model: collect data from the user base, then expose the data to advertisement agencies in order to turn a profit. Such companies routinely market a service as "free", while obfuscating the fact that they tend to "charge" users in the currency of personal information rather than money. However, online companies also gather user data for more principled purposes, such as improving the user experience and aggregating statistics. The problem is the sale of user data to third parties. In this work, we design an intelligent approach to online privacy protection that leverages supervised learning. By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user. In our evaluation, we collect a dataset of network requests and measure the performance of several classifiers that adhere to the supervised learning paradigm. The results of our evaluation demonstrate the feasibility and potential of our approach.
翻译:在线上运营的公司——尤其是纯数字化企业——往往遵循这样的商业模式:收集用户数据,再将数据暴露给广告机构以获取利润。这类公司通常将服务标榜为"免费",却刻意掩盖其以个人信息而非货币向用户"收费"的事实。然而,在线公司收集用户数据也存在更正当的目的,例如改善用户体验和汇总统计数据。问题的症结在于将用户数据出售给第三方。本研究设计了一种基于监督学习的智能在线隐私保护方法。通过检测并拦截可能侵犯用户隐私的数据采集行为,我们能够为用户恢复一定程度的数字隐私。在评估环节中,我们构建了网络请求数据集,并测量了多种遵循监督学习范式的分类器性能。评估结果表明了本方法的可行性与潜力。