Intersectional Data and the Social Cost of Digital Extraction: A Pigouvian Surcharge

Contemporary digital capitalism relies on the large-scale extraction and commodification of personal data. Far from revealing isolated attributes, such data increasingly exposes intersectional social identities formed by combinations of race, gender, disability and others. This process generates a structural privacy externality: while firms appropriate economic value through profiling, prediction, and personalization, individuals and social groups bear diffuse costs in the form of heightened social risk, discrimination, and vulnerability. This paper develops a formal political economic framework to internalize these externalities by linking data valuation to information-theoretic measures. We propose a pricing rule based on mutual information that assigns monetary value to the entropy reduction induced by individual data points over joint intersectional identity distributions. Interpreted as a Pigouvian-style surcharge on data extraction, this mechanism functions as an institutional constraint on the asymmetric accumulation of informational power. A key advantage of the approach is its model-agnostic character: the valuation rule operates independently of the statistical structure used to estimate intersectional attributes, whether parametric, nonparametric, or machine-learned, and can be approximated through discretization of joint distributions. We argue that regulators can calibrate this surcharge to reflect contested social values, thereby embedding normative judgments directly into market design. By formalizing the social cost of intersectional data extraction, the proposed mechanism offers both a corrective to market failure and a redistributive institutional shield for vulnerable groups under conditions of digital asymmetry.

翻译：当代数字资本主义依赖于大规模的个人数据提取与商品化。此类数据远非揭示孤立的属性，而是日益暴露由种族、性别、残疾等因素组合形成的交叉性社会身份。这一过程产生了一种结构性隐私外部性：企业通过画像、预测和个性化服务获取经济价值，而个人与社会群体则需承担以社会风险加剧、歧视和脆弱性等形式存在的弥散性成本。本文构建了一个形式化的政治经济学框架，通过将数据估值与信息论度量相连接，从而将这些外部性内部化。我们提出了一种基于互信息的定价规则，该规则根据个体数据点对联合交叉性身份分布所引发的熵减来分配货币价值。这一机制可被解释为对数据提取征收的庇古式附加费，其功能是作为一种制度性约束，以限制信息权力的不对称积累。该方法的一个关键优势在于其模型无关性：该估值规则独立于用于估计交叉性属性的统计结构（无论是参数化、非参数化还是机器学习方法），并可通过联合分布的离散化进行近似计算。我们认为，监管机构可以校准此附加费以反映有争议的社会价值，从而将规范性判断直接嵌入市场设计之中。通过形式化交叉性数据提取的社会成本，所提出的机制既为市场失灵提供了纠正措施，也为数字不对称条件下弱势群体提供了再分配性的制度保护。