强化学习 + Transformer = 通用问题解决器
RL + Transformer = A General-Purpose Problem Solver
January 24, 2025
作者: Micah Rentschler, Jesse Roberts
cs.AI
摘要
如果人工智能不仅能解决其训练过的问题,还能学会自我教导以解决新问题(即元学习),会怎样呢?在这项研究中,我们展示了一个经过强化学习微调的预训练变压器,在多个周期内发展出解决从未遇到过的问题的能力 - 这种新兴能力被称为上下文强化学习(ICRL)。这种强大的元学习器不仅在解决未曾见过的内部分布环境时表现出色且具有显著的样本效率,还在外部分布环境中表现出色。此外,我们展示它对训练数据质量的稳健性,能够无缝地将上下文中的行为串联起来,并适应非静态环境。这些行为表明,经过强化学习训练的变压器可以迭代改进自己的解决方案,使其成为一个出色的通用问题解决器。
English
What if artificial intelligence could not only solve problems for which it
was trained but also learn to teach itself to solve new problems (i.e.,
meta-learn)? In this study, we demonstrate that a pre-trained transformer
fine-tuned with reinforcement learning over multiple episodes develops the
ability to solve problems that it has never encountered before - an emergent
ability called In-Context Reinforcement Learning (ICRL). This powerful
meta-learner not only excels in solving unseen in-distribution environments
with remarkable sample efficiency, but also shows strong performance in
out-of-distribution environments. In addition, we show that it exhibits
robustness to the quality of its training data, seamlessly stitches together
behaviors from its context, and adapts to non-stationary environments. These
behaviors demonstrate that an RL-trained transformer can iteratively improve
upon its own solutions, making it an excellent general-purpose problem solver.Summary
AI-Generated Summary