Collectionless Artificial Intelligence

By and large, the professional handling of huge data collections is regarded as a fundamental ingredient of the progress of machine learning and of its spectacular results in related disciplines, with a growing agreement on risks connected to the centralization of such data collections. This paper sustains the position that the time has come for thinking of new learning protocols where machines conquer cognitive skills in a truly human-like context centered on environmental interactions. This comes with specific restrictions on the learning protocol according to the collectionless principle, which states that, at each time instant, data acquired from the environment is processed with the purpose of contributing to update the current internal representation of the environment, and that the agent is not given the privilege of recording the temporal stream. Basically, there is neither permission to store the temporal information coming from the sensors, thus promoting the development of self-organized memorization skills at a more abstract level, instead of relying on bare storage to simulate learning dynamics that are typical of offline learning algorithms. This purposely extreme position is intended to stimulate the development of machines that learn to dynamically organize the information by following human-based schemes. The proposition of this challenge suggests developing new foundations on computational processes of learning and reasoning that might open the doors to a truly orthogonal competitive track on AI technologies that avoid data accumulation by design, thus offering a framework which is better suited concerning privacy issues, control and customizability. Finally, pushing towards massively distributed computation, the collectionless approach to AI will likely reduce the concentration of power in companies and governments, thus better facing geopolitical issues.

翻译：大致而言，对海量数据集合的专业化处理被视为机器学习进展及其在相关学科中取得惊人成果的基本要素，同时人们也逐渐认识到与这些数据集合集中化相关的风险。本文坚持认为，当前是时候思考新的学习协议了——在这种协议中，机器在真正类人的、以环境交互为核心的语境中掌握认知技能。这根据无数据收集原则对学习协议施加了特定限制，该原则规定：在每个时刻，从环境获取的数据被用于更新当前的环境内在表征，且智能体无权记录时间流。本质上，既不允许存储来自传感器的时序信息，从而在更抽象层面促进自组织记忆技能的发展，而非依赖原始存储来模拟离线学习算法典型的学习动态。这一故意极端化的立场旨在激励机器开发，使其遵循基于人类的方式学习动态组织信息。提出这一挑战，意味着需要建立学习与推理计算过程的新基础，这可能为真正正交竞争的人工智能技术赛道打开大门——这类技术通过设计避免数据积累，从而在隐私、控制与可定制性方面提供更合适的框架。最后，通过推动大规模分布式计算，无数据收集的人工智能方法将可能减少企业及政府的权力集中，从而更好地应对地缘政治挑战。