ChatPaper.aiChatPaper

走向通用无模型强化学习

Towards General-Purpose Model-Free Reinforcement Learning

January 27, 2025
作者: Scott Fujimoto, Pierluca D'Oro, Amy Zhang, Yuandong Tian, Michael Rabbat
cs.AI

摘要

强化学习(RL)承诺提供一个几乎通用的问题解决框架。然而,在实践中,RL算法通常针对特定基准进行定制,依赖精心调整的超参数和算法选择。最近,强大的基于模型的RL方法在各种基准测试中展现出令人印象深刻的普适结果,但代价是增加了复杂性和运行时间,限制了它们的广泛适用性。在本文中,我们尝试找到一个统一的无模型深度RL算法,可以处理各种领域和问题设置。为了实现这一目标,我们利用基于模型的表示,大致线性化价值函数,利用基于模型的RL使用的更密集的任务目标,同时避免与规划或模拟轨迹相关的成本。我们使用一组超参数在各种常见RL基准测试中评估我们的算法MR.Q,并展示与特定领域和通用基线相比具有竞争力的性能,为构建通用无模型深度RL算法迈出了实质性的一步。
English
Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored to specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.

Summary

AI-Generated Summary

PDF283January 28, 2025