NoisyRollout:透過數據增強強化視覺推理能力
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
April 17, 2025
作者: Xiangyan Liu, Jinjie Ni, Zijian Wu, Chao Du, Longxu Dou, Haonan Wang, Tianyu Pang, Michael Qizhe Shieh
cs.AI
摘要
近期強化學習(RL)的進展增強了視覺語言模型(VLMs)的推理能力。然而,在VLMs中,如何提升策略探索以更有效地擴展測試時計算資源仍未被充分探索。此外,VLMs在處理不完美的視覺感知方面仍存在困難,這反過來影響了後續的推理過程。為此,我們提出了NoisyRollout,這是一種簡單而有效的RL方法,通過混合來自乾淨和適度失真圖像的軌跡,在視覺感知和由此產生的推理模式中引入有針對性的多樣性。在無需額外訓練成本的情況下,NoisyRollout通過融入視覺導向的歸納偏置,增強了VLMs的探索能力。此外,NoisyRollout採用了一種噪聲退火調度,在訓練過程中逐漸降低失真強度,確保在早期從噪聲信號中獲益,同時在後期保持訓練的穩定性和可擴展性。僅使用2.1K訓練樣本,NoisyRollout在涵蓋推理和感知任務的5個域外基準測試中,實現了開源RL調優模型中的最先進性能,同時保持了相當甚至更好的域內性能。
English
Recent advances in reinforcement learning (RL) have strengthened the
reasoning capabilities of vision-language models (VLMs). However, enhancing
policy exploration to more effectively scale test-time compute remains
underexplored in VLMs. In addition, VLMs continue to struggle with imperfect
visual perception, which in turn affects the subsequent reasoning process. To
this end, we propose NoisyRollout, a simple yet effective RL approach that
mixes trajectories from both clean and moderately distorted images to introduce
targeted diversity in visual perception and the resulting reasoning patterns.
Without additional training cost, NoisyRollout enhances the exploration
capabilities of VLMs by incorporating a vision-oriented inductive bias.
Furthermore, NoisyRollout employs a noise annealing schedule that gradually
reduces distortion strength over training, ensuring benefit from noisy signals
early while maintaining training stability and scalability in later stages.
With just 2.1K training samples, NoisyRollout achieves state-of-the-art
performance among open-source RL-tuned models on 5 out-of-domain benchmarks
spanning both reasoning and perception tasks, while preserving comparable or
even better in-domain performance.Summary
AI-Generated Summary