By and large, the professional handling of huge data collections is regarded as a fundamental ingredient of the progress of machine learning and of its spectacular results in related disciplines, with a growing agreement on risks connected to the centralization of such data collections. This paper sustains the position that the time has come for thinking of new learning protocols where machines conquer cognitive skills in a truly human-like context centered on environmental interactions. This comes with specific restrictions on the learning protocol according to the collectionless principle, which states that, at each time instant, data acquired from the environment is processed with the purpose of contributing to update the current internal representation of the environment, and that the agent is not given the privilege of recording the temporal stream. Basically, there is neither permission to store the temporal information coming from the sensors, thus promoting the development of self-organized memorization skills at a more abstract level, instead of relying on bare storage to simulate learning dynamics that are typical of offline learning algorithms. This purposely extreme position is intended to stimulate the development of machines that learn to dynamically organize the information by following human-based schemes. The proposition of this challenge suggests developing new foundations on computational processes of learning and reasoning that might open the doors to a truly orthogonal competitive track on AI technologies that avoid data accumulation by design, thus offering a framework which is better suited concerning privacy issues, control and customizability. Finally, pushing towards massively distributed computation, the collectionless approach to AI will likely reduce the concentration of power in companies and governments, thus better facing geopolitical issues.
翻译:通常而言,专业处理海量数据集合被视为机器学习进展及其在相关领域取得惊人成果的基本要素,而围绕此类数据集合集中化所伴随的风险也日益达成共识。本文主张,是时候思考全新的学习协议,使机器在真正类人的环境中通过与环境的交互来习得认知技能。这要求学习协议遵循"无数据收集原则"的特定约束:在每个时间瞬间,从环境获取的数据仅用于更新当前环境内部表征,且智能体不具备记录时间序列流的特权。本质上,既不允许存储来自传感器的时序信息,从而在更抽象层面促进自组织记忆能力的发展,而非依赖原始存储来模拟离线学习算法的动态过程。这种刻意极端的立场旨在推动机器遵循人类认知模式,发展动态组织信息的学习机制。本挑战的提出建议发展全新的学习与推理计算过程基础,可能为AI技术开辟真正正交的竞争路径——通过设计避免数据累积,从而在隐私问题、可控性与可定制性方面提供更优框架。最终,推动大规模分布式计算的无数据收集AI方法将有助于降低企业与政府的数据权力集中,从而更有效地应对地缘政治挑战。