While a variety of methods offer good yield prediction on histogrammed remote sensing data, vision Transformers are only sparsely represented in the literature. The Convolution vision Transformer (CvT) is being tested to evaluate vision Transformers that are currently achieving state-of-the-art results in many other vision tasks. CvT combines some of the advantages of convolution with the advantages of dynamic attention and global context fusion of Transformers. It performs worse than widely tested methods such as XGBoost and CNNs, but shows that Transformers have potential to improve yield prediction.
翻译:尽管多种方法在基于直方图的遥感数据产量预测中表现良好,但视觉Transformer在相关文献中鲜有应用。本研究测试了卷积视觉Transformer(CvT),以评估当前在众多其他视觉任务中达到最优结果的架构。CvT结合了卷积的部分优势与Transformer动态注意力及全局上下文融合的特性。虽然其表现不如广泛验证的XGBoost和CNN等方法,但研究结果表明Transformer具有改进产量预测的潜力。