일반 목적의 모델 없는 강화 학습을 향하여

초록

강화 학습 (RL)은 거의 모든 문제 해결을 위한 프레임워크를 약속합니다. 그러나 실제로는 RL 알고리즘들이 종종 특정 벤치마크에 맞게 조정되며, 세심하게 조정된 초매개변수와 알고리즘 선택에 의존합니다. 최근에는 강력한 모델 기반 RL 방법들이 벤치마크 전반에 걸쳐 인상적인 일반 결과를 보여주었지만, 증가된 복잡성과 느린 실행 시간이라는 비용을 지불해야 하는 한계가 있습니다. 본 논문에서는 다양한 도메인 및 문제 설정을 다룰 수 있는 통합된 모델 없는 심층 강화 학습 알고리즘을 찾으려고 합니다. 이를 위해, 모델 기반 표현을 활용하여 가치 함수를 대략적으로 선형화하고, 모델 기반 RL에서 사용되는 보다 밀도 높은 작업 목표를 활용하면서 계획이나 시뮬레이션된 경로와 관련된 비용을 피합니다. 우리는 MR.Q라는 알고리즘을 여러 일반적인 RL 벤치마크에서 단일 초매개변수 세트로 평가하고, 도메인 특화 및 일반적인 기준에 대해 경쟁력 있는 성능을 보여줌으로써 일반적인 목적의 모델 없는 심층 강화 학습 알고리즘 구축에 구체적인 한걸음을 제공합니다.

English

Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored to specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.

일반 목적의 모델 없는 강화 학습을 향하여

Towards General-Purpose Model-Free Reinforcement Learning

초록

Summary

Support