The usage of the mobile app is unassailable in this digital era. While tons of data are generated daily, user privacy security concerns become an important issue. Nowadays, tons of techniques, such as machine learning and deep learning traffic classifiers, have been applied to analyze users app traffic. These techniques allow the monitor to get the fingerprints of using apps while the user traffic is still encrypted, which raises a severe privacy issue. In order to fight against this type of data analysis, people have been researching obfuscation algorithms to confuse feature-based machine learning classifiers with data camouflage by modification on packet length distribution. The existing works achieve this goal by remapping traffic packet length distribution from the source app to the fake camouflage app. However, this solution suffers from its lack of scalability and flexibility in practical application since the method needs to pre-sample the target fake apps traffic before the use of traffic camouflage. In this paper, we proposed a practical solution by using a mathematical model to calculate the target distribution while maintaining at least 50 percent accuracy drops on the performance of the AppScanner mobile traffic classifier and roughly 20 percent overhead created during packet modification.
翻译:在这个数字时代,移动应用的使用无可争议。当每天产生大量数据时,用户隐私安全问题成为一个重要议题。如今,诸如机器学习和深度学习流量分类器等大量技术已被应用于分析用户的应用流量。这些技术使得监控者能在用户流量仍处于加密状态时获取使用应用的指纹,从而引发了严重的隐私问题。为了对抗此类数据分析,研究人员一直在探索混淆算法,通过修改数据包长度分布进行数据伪装,以迷惑基于特征的机器学习分类器。现有工作通过将流量数据包长度分布从源应用重新映射到伪装虚假应用来实现这一目标。然而,该方案在实际应用中缺乏可扩展性和灵活性,因为该方法需要在流量伪装前预先采样目标虚假应用的流量。在本文中,我们提出了一种实用解决方案,通过数学模型计算目标分布,同时确保AppScanner移动流量分类器的性能至少下降50%,并在数据包修改过程中产生约20%的开销。