强化学习 + Transformer = 通用问题解决器

摘要

如果人工智能不仅能解决其训练过的问题，还能学会自我教导以解决新问题（即元学习），会怎样呢？在这项研究中，我们展示了一个经过强化学习微调的预训练变压器，在多个周期内发展出解决从未遇到过的问题的能力 - 这种新兴能力被称为上下文强化学习（ICRL）。这种强大的元学习器不仅在解决未曾见过的内部分布环境时表现出色且具有显著的样本效率，还在外部分布环境中表现出色。此外，我们展示它对训练数据质量的稳健性，能够无缝地将上下文中的行为串联起来，并适应非静态环境。这些行为表明，经过强化学习训练的变压器可以迭代改进自己的解决方案，使其成为一个出色的通用问题解决器。

English

What if artificial intelligence could not only solve problems for which it was trained but also learn to teach itself to solve new problems (i.e., meta-learn)? In this study, we demonstrate that a pre-trained transformer fine-tuned with reinforcement learning over multiple episodes develops the ability to solve problems that it has never encountered before - an emergent ability called In-Context Reinforcement Learning (ICRL). This powerful meta-learner not only excels in solving unseen in-distribution environments with remarkable sample efficiency, but also shows strong performance in out-of-distribution environments. In addition, we show that it exhibits robustness to the quality of its training data, seamlessly stitches together behaviors from its context, and adapts to non-stationary environments. These behaviors demonstrate that an RL-trained transformer can iteratively improve upon its own solutions, making it an excellent general-purpose problem solver.

强化学习 + Transformer = 通用问题解决器

RL + Transformer = A General-Purpose Problem Solver

摘要

Summary

Support