대형 언어 모델을 활용한 제로샷 모델 기반 강화 학습

초록

대형 언어 모델(LLM)의 신생 제로샷 기능은 자연어 처리 작업을 넘어 다양한 영역에 적용되고 있습니다. 강화 학습에서 LLM은 텍스트 기반 환경에서 널리 사용되어 왔지만 연속 상태 공간과의 통합은 미연구 상태입니다. 본 논문에서는 사전 훈련된 LLM이 연속적인 마르코프 의사 결정 과정의 동역학을 문맥 속에서 예측하는 데 어떻게 활용될 수 있는지 조사합니다. 다변량 데이터 처리와 제어 신호 통합을 식별하여 LLM의 이러한 설정에서의 활용 가능성을 제한하는 주요 도전 과제로 지적하고, 이를 해결하기 위해 Disentangled In-Context Learning (DICL)을 제안합니다. 우리는 제안된 방법의 이론적 분석을 지원하며, 모델 기반 정책 평가 및 데이터 보강형 오프-폴리시 강화 학습 두 가지 설정에서의 개념 증명 응용을 제시합니다. 실험 결과는 우리의 접근 방식이 잘 보정된 불확실성 추정을 생성한다는 것을 더욱 입증합니다. 코드는 https://github.com/abenechehab/dicl에서 공개되어 있습니다.

English

The emerging zero-shot capabilities of Large Language Models (LLMs) have led to their applications in areas extending well beyond natural language processing tasks. In reinforcement learning, while LLMs have been extensively used in text-based environments, their integration with continuous state spaces remains understudied. In this paper, we investigate how pre-trained LLMs can be leveraged to predict in context the dynamics of continuous Markov decision processes. We identify handling multivariate data and incorporating the control signal as key challenges that limit the potential of LLMs' deployment in this setup and propose Disentangled In-Context Learning (DICL) to address them. We present proof-of-concept applications in two reinforcement learning settings: model-based policy evaluation and data-augmented off-policy reinforcement learning, supported by theoretical analysis of the proposed methods. Our experiments further demonstrate that our approach produces well-calibrated uncertainty estimates. We release the code at https://github.com/abenechehab/dicl.

대형 언어 모델을 활용한 제로샷 모델 기반 강화 학습

Zero-shot Model-based Reinforcement Learning using Large Language Models

초록

Support