Are LSTMs Good Few-Shot Learners?

Deep learning requires large amounts of data to learn new tasks well, limiting its applicability to domains where such data is available. Meta-learning overcomes this limitation by learning how to learn. In 2001, Hochreiter et al. showed that an LSTM trained with backpropagation across different tasks is capable of meta-learning. Despite promising results of this approach on small problems, and more recently, also on reinforcement learning problems, the approach has received little attention in the supervised few-shot learning setting. We revisit this approach and test it on modern few-shot learning benchmarks. We find that LSTM, surprisingly, outperform the popular meta-learning technique MAML on a simple few-shot sine wave regression benchmark, but that LSTM, expectedly, fall short on more complex few-shot image classification benchmarks. We identify two potential causes and propose a new method called Outer Product LSTM (OP-LSTM) that resolves these issues and displays substantial performance gains over the plain LSTM. Compared to popular meta-learning baselines, OP-LSTM yields competitive performance on within-domain few-shot image classification, and performs better in cross-domain settings by 0.5% to 1.9% in accuracy score. While these results alone do not set a new state-of-the-art, the advances of OP-LSTM are orthogonal to other advances in the field of meta-learning, yield new insights in how LSTM work in image classification, allowing for a whole range of new research directions. For reproducibility purposes, we publish all our research code publicly.

翻译：深度学习需要大量数据才能很好地学习新任务，这限制了其在数据充足的领域中的应用。元学习通过学会如何学习克服了这一局限。2001年，Hochreiter等人证明，通过跨不同任务的反向传播训练的LSTM能够进行元学习。尽管该方法在小规模问题上取得了令人鼓舞的结果，并且最近在强化学习问题上也有类似表现，但它在监督式小样本学习场景中却很少受到关注。我们重新审视了这种方法，并在现代小样本学习基准上进行了测试。我们惊讶地发现，在简单的小样本正弦波回归基准上，LSTM的表现优于流行的元学习技术MAML；但不出所料，在更复杂的小样本图像分类基准上，LSTM表现不佳。我们识别了两个潜在原因，并提出了一种名为外积LSTM（OP-LSTM）的新方法，该方法解决了这些问题，并在普通LSTM基础上实现了显著的性能提升。与流行的元学习基线相比，OP-LSTM在域内小样本图像分类中具有竞争力，并在跨域设置中实现了0.5%至1.9%的准确率提升。虽然仅凭这些结果尚未达到新的最先进水平，但OP-LSTM的进展与元学习领域的其他进展正交，为理解LSTM在图像分类中的工作机制提供了新见解，从而开辟了广阔的新研究方向。为便于复现，我们公开了所有研究代码。

相关内容

长短期记忆网络

关注 120

长短期记忆网络(LSTM)是一种用于深度学习领域的人工回归神经网络(RNN)结构。与标准的前馈神经网络不同，LSTM具有反馈连接。它不仅可以处理单个数据点(如图像)，还可以处理整个数据序列(如语音或视频)。例如，LSTM适用于未分段、连接的手写识别、语音识别、网络流量或IDSs(入侵检测系统)中的异常检测等任务。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日