Protecting User Privacy in Online Settings via Supervised Learning

Companies that have an online presence-in particular, companies that are exclusively digital-often subscribe to this business model: collect data from the user base, then expose the data to advertisement agencies in order to turn a profit. Such companies routinely market a service as "free", while obfuscating the fact that they tend to "charge" users in the currency of personal information rather than money. However, online companies also gather user data for more principled purposes, such as improving the user experience and aggregating statistics. The problem is the sale of user data to third parties. In this work, we design an intelligent approach to online privacy protection that leverages supervised learning. By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user. In our evaluation, we collect a dataset of network requests and measure the performance of several classifiers that adhere to the supervised learning paradigm. The results of our evaluation demonstrate the feasibility and potential of our approach.

翻译：在线上运营的公司——尤其是纯数字化企业——往往遵循这样的商业模式：收集用户数据，再将数据暴露给广告机构以获取利润。这类公司通常将服务标榜为"免费"，却刻意掩盖其以个人信息而非货币向用户"收费"的事实。然而，在线公司收集用户数据也存在更正当的目的，例如改善用户体验和汇总统计数据。问题的症结在于将用户数据出售给第三方。本研究设计了一种基于监督学习的智能在线隐私保护方法。通过检测并拦截可能侵犯用户隐私的数据采集行为，我们能够为用户恢复一定程度的数字隐私。在评估环节中，我们构建了网络请求数据集，并测量了多种遵循监督学习范式的分类器性能。评估结果表明了本方法的可行性与潜力。

相关内容

监督学习

关注 132

监督学习是指：利用一组已知类别的样本调整分类器的参数，使其达到所要求性能的过程，也称为监督训练或有教师学习。监督学习是从标记的训练数据来推断一个功能的机器学习任务。训练数据包括一套训练示例。在监督学习中，每个实例都是由一个输入对象（通常为矢量）和一个期望的输出值（也称为监督信号）组成。监督学习算法是分析该训练数据，并产生一个推断的功能，其可以用于映射出新的实例。一个最佳的方案将允许该算法来正确地决定那些看不见的实例的类标签。这就要求学习算法是在一种“合理”的方式从一种从训练数据到看不见的情况下形成。

「联邦学习模型安全与隐私」研究进展

专知会员服务

69+阅读 · 2022年9月24日

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日