Major TOM: Expandable Datasets for Earth Observation

Deep learning models are increasingly data-hungry, requiring significant resources to collect and compile the datasets needed to train them, with Earth Observation (EO) models being no exception. However, the landscape of datasets in EO is relatively atomised, with interoperability made difficult by diverse formats and data structures. If ever larger datasets are to be built, and duplication of effort minimised, then a shared framework that allows users to combine and access multiple datasets is needed. Here, Major TOM (Terrestrial Observation Metaset) is proposed as this extensible framework. Primarily, it consists of a geographical indexing system based on a set of grid points and a metadata structure that allows multiple datasets with different sources to be merged. Besides the specification of Major TOM as a framework, this work also presents a large, open-access dataset, MajorTOM-Core, which covers the vast majority of the Earth's land surface. This dataset provides the community with both an immediately useful resource, as well as acting as a template for future additions to the Major TOM ecosystem. Access: https://huggingface.co/Major-TOM

翻译：深度学习模型日益需要海量数据，收集和整理训练这些模型所需的数据集需要大量资源，地球观测（EO）模型也不例外。然而，地球观测领域的数据集现状相对碎片化，多样的格式和数据结构使得互操作性难以实现。若要构建更庞大的数据集并尽量减少重复工作，则需要一个允许用户整合和访问多个数据集的共享框架。本文提出的Major TOM（Terrestrial Observation Metaset）正是这样一个可扩展框架。其核心主要包括一个基于网格点集的地理索引系统，以及一个允许合并不同来源多数据集的元数据结构。除了将Major TOM规范作为框架提出外，本研究还发布了一个大型开放访问数据集MajorTOM-Core，该数据集覆盖了地球绝大部分陆地表面。该数据集不仅为研究社区提供了可直接使用的宝贵资源，同时也为未来Major TOM生态系统的扩展提供了模板。访问地址：https://huggingface.co/Major-TOM

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日