透過均衡序列建模的閉環長程機器人規劃

摘要

在讓自主機器人採取行動的努力中，任務規劃是一個重大挑戰，需要將高層次任務描述轉化為長期行動序列。儘管語言模型代理近年來取得了進展，但它們仍然容易出現規劃錯誤，並且在規劃能力方面存在限制。為了解決機器人規劃中的這些限制，我們提倡一種自我完善方案，通過反覆改進草擬計劃直到達到平衡。值得注意的是，這個過程可以從分析角度進行端對端優化，無需策劃額外的驗證器或獎勵模型，使我們能夠以簡單的監督學習方式訓練自我完善的規劃器。同時，我們設計了一種嵌套平衡序列建模程序，用於高效的閉環規劃，並整合了來自環境（或內部世界模型）的有用反饋。我們的方法在VirtualHome-Env基準測試上進行了評估，展現出更好的推理計算擴展性。代碼可在https://github.com/Singularity0104/equilibrium-planner找到。

English

In the endeavor to make autonomous robots take actions, task planning is a major challenge that requires translating high-level task descriptions into long-horizon action sequences. Despite recent advances in language model agents, they remain prone to planning errors and limited in their ability to plan ahead. To address these limitations in robotic planning, we advocate a self-refining scheme that iteratively refines a draft plan until an equilibrium is reached. Remarkably, this process can be optimized end-to-end from an analytical perspective without the need to curate additional verifiers or reward models, allowing us to train self-refining planners in a simple supervised learning fashion. Meanwhile, a nested equilibrium sequence modeling procedure is devised for efficient closed-loop planning that incorporates useful feedback from the environment (or an internal world model). Our method is evaluated on the VirtualHome-Env benchmark, showing advanced performance with better scaling for inference computation. Code is available at https://github.com/Singularity0104/equilibrium-planner.

透過均衡序列建模的閉環長程機器人規劃

Closed-loop Long-horizon Robotic Planning via Equilibrium Sequence Modeling

摘要

Summary

Support

Support