대규모 순환 액션 모델: xLSTM은 로봇 작업에 대한 빠른 추론을 가능하게 합니다.

초록

최근 몇 년간 강화 학습(Reinforcement Learning, RL) 분야에서는 대규모 행동 모델이 대규모 데이터셋을 이용한 오프라인 학습을 통해 시퀀스 모델링을 통해 훈련되는 추세가 있었습니다. 기존 모델은 주로 Transformer 아키텍처를 기반으로 하며 강력한 에이전트를 얻게 됩니다. 그러나 Transformer 기반 접근 방식은 추론 시간이 느려 로봇공학과 같은 실시간 응용에는 비실용적입니다. 최근에는 xLSTM과 Mamba와 같은 현대적인 순환 아키텍처가 제안되었는데, 이들은 Transformer 아키텍처와 유사하게 훈련 중 병렬화 이점을 보여주면서 빠른 추론을 제공합니다. 본 연구에서는 이러한 현대적인 순환 아키텍처들이 대규모 행동 모델에 적합한지를 연구합니다. 따라서 xLSTM을 핵심으로 하는 Large Recurrent Action Model (LRAM)을 제안하며, 이는 선형 시간 추론 복잡성과 자연스러운 시퀀스 길이 추정 능력을 갖추고 있습니다. 6개 도메인의 432개 작업에 대한 실험 결과 LRAM이 성능과 속도 측면에서 Transformer와 유리한 비교 결과를 보여줍니다.

English

In recent years, there has been a trend in the field of Reinforcement Learning (RL) towards large action models trained offline on large-scale datasets via sequence modeling. Existing models are primarily based on the Transformer architecture, which result in powerful agents. However, due to slow inference times, Transformer-based approaches are impractical for real-time applications, such as robotics. Recently, modern recurrent architectures, such as xLSTM and Mamba, have been proposed that exhibit parallelization benefits during training similar to the Transformer architecture while offering fast inference. In this work, we study the aptitude of these modern recurrent architectures for large action models. Consequently, we propose a Large Recurrent Action Model (LRAM) with an xLSTM at its core that comes with linear-time inference complexity and natural sequence length extrapolation abilities. Experiments on 432 tasks from 6 domains show that LRAM compares favorably to Transformers in terms of performance and speed.

대규모 순환 액션 모델: xLSTM은 로봇 작업에 대한 빠른 추론을 가능하게 합니다.

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

초록

Support