利用大型語言模型進行零樣本模型強化學習

摘要

大型語言模型（LLM）新興的零樣本能力已導致它們在自然語言處理任務之外的領域中的應用。在強化學習中，雖然LLM在基於文本的環境中被廣泛使用，但它們與連續狀態空間的整合仍未受到充分研究。本文探討了預訓練的LLM如何被利用來在上下文中預測連續馬爾可夫決策過程的動態。我們確定處理多變數數據和將控制信號納入其中是限制LLM在這種設置中部署潛力的關鍵挑戰，並提出了「分離上下文學習」（DICL）來應對這些挑戰。我們在兩個強化學習設置中提出了概念驗證應用：基於模型的策略評估和數據增強的離線策略強化學習，並通過對所提出方法的理論分析加以支持。我們的實驗進一步證明了我們的方法產生了良好校準的不確定性估計。我們在https://github.com/abenechehab/dicl 上發布了代碼。

English

The emerging zero-shot capabilities of Large Language Models (LLMs) have led to their applications in areas extending well beyond natural language processing tasks. In reinforcement learning, while LLMs have been extensively used in text-based environments, their integration with continuous state spaces remains understudied. In this paper, we investigate how pre-trained LLMs can be leveraged to predict in context the dynamics of continuous Markov decision processes. We identify handling multivariate data and incorporating the control signal as key challenges that limit the potential of LLMs' deployment in this setup and propose Disentangled In-Context Learning (DICL) to address them. We present proof-of-concept applications in two reinforcement learning settings: model-based policy evaluation and data-augmented off-policy reinforcement learning, supported by theoretical analysis of the proposed methods. Our experiments further demonstrate that our approach produces well-calibrated uncertainty estimates. We release the code at https://github.com/abenechehab/dicl.

利用大型語言模型進行零樣本模型強化學習

Zero-shot Model-based Reinforcement Learning using Large Language Models

摘要

Summary

Support

Support