Datasheets for Machine Learning Sensors: Towards Transparency, Auditability, and Responsibility for Intelligent Sensing

Matthew Stewart,Pete Warden,Yasmine Omri,Shvetank Prakash,Joao Santos,Shawn Hymel,Benjamin Brown,Jim MacArthur,Nat Jeffries,Sachin Katti,Brian Plancher,Vijay Janapa Reddi

Machine learning (ML) sensors are enabling intelligence at the edge by empowering end-users with greater control over their data. ML sensors offer a new paradigm for sensing that moves the processing and analysis to the device itself rather than relying on the cloud, bringing benefits like lower latency and greater data privacy. The rise of these intelligent edge devices, while revolutionizing areas like the internet of things (IoT) and healthcare, also throws open critical questions about privacy, security, and the opacity of AI decision-making. As ML sensors become more pervasive, it requires judicious governance regarding transparency, accountability, and fairness. To this end, we introduce a standard datasheet template for these ML sensors and discuss and evaluate the design and motivation for each section of the datasheet in detail including: standard dasheet components like the system's hardware specifications, IoT and AI components like the ML model and dataset attributes, as well as novel components like end-to-end performance metrics, and expanded environmental impact metrics. To provide a case study of the application of our datasheet template, we also designed and developed two examples for ML sensors performing computer vision-based person detection: one an open-source ML sensor designed and developed in-house, and a second commercial ML sensor developed by our industry collaborators. Together, ML sensors and their datasheets provide greater privacy, security, transparency, explainability, auditability, and user-friendliness for ML-enabled embedded systems. We conclude by emphasizing the need for standardization of datasheets across the broader ML community to ensure the responsible use of sensor data.

翻译：机器学习（ML）传感器通过赋予终端用户对数据的更大控制权，正在边缘端实现智能能力。ML传感器提供了一种新的传感范式，将处理与分析过程迁移至设备本身而非依赖云端，从而带来更低延迟和更高数据隐私等优势。这些智能边缘设备的兴起，在彻底改变物联网（IoT）和医疗健康等领域的同时，也引发了关于隐私、安全以及人工智能决策不透明性的关键问题。随着ML传感器日益普及，迫切需要对其透明性、问责性和公平性进行审慎治理。为此，我们为这些ML传感器引入一套标准数据表模板，并详细讨论和评估了数据表各节的设计动机，包括：标准数据表组件（如系统硬件规格）、物联网与人工智能组件（如ML模型与数据集属性），以及新型组件（如端到端性能指标和扩展的环境影响指标）。为提供数据表模板的应用案例，我们还设计并开发了两个面向计算机视觉人体检测的ML传感器示例：一个为内部设计开发的开源ML传感器，另一个为行业合作者开发的商用ML传感器。通过ML传感器及其数据表的结合，可为ML赋能的嵌入式系统提供更高的隐私性、安全性、透明性、可解释性、可审计性和用户友好性。最后，我们强调需要在更广泛的ML社区中推动数据表的标准化，以确保传感器数据的负责任使用。