Previous theoretical results pertaining to meta-learning on sequences build on contrived assumptions and are somewhat convoluted. We introduce new information-theoretic tools that lead to an elegant and very general decomposition of error into three components: irreducible error, meta-learning error, and intra-task error. These tools unify analyses across many meta-learning challenges. To illustrate, we apply them to establish new results about in-context learning with transformers. Our theoretical results characterizes how error decays in both the number of training sequences and sequence lengths. Our results are very general; for example, they avoid contrived mixing time assumptions made by all prior results that establish decay of error with sequence length.
翻译:此前关于序列元学习的理论结果建立在人为假设之上且较为复杂。我们引入新的信息论工具,将误差优雅且极其一般地分解为三个分量:不可约误差、元学习误差和任务内误差。这些工具统一了众多元学习挑战的分析。为作说明,我们应用它们建立了关于Transformer上下文学习的新结果。我们的理论结果刻画了误差如何随训练序列数量和序列长度衰减。该结果具有高度普适性;例如,它避免了此前所有建立序列长度误差衰减结果所需的人为混合时间假设。