代碼即監控：面向反應式和主動式機器人故障檢測的限制感知視覺編程

摘要

在閉環機器人系統中，自動檢測和預防開放式失敗是至關重要的。最近的研究通常難以同時在意外失敗發生後以反應方式識別它們，並以主動方式預防可預見的失敗。為此，我們提出了代碼作為監控器（CaM），這是一種新穎的範式，利用視覺語言模型（VLM）來進行開放式反應和主動失敗檢測。我們方法的核心是將這兩個任務制定為一組統一的時空約束滿足問題，並使用VLM生成的代碼來進行實時監控。為了增強監控的準確性和效率，我們進一步引入了約束元素，將與約束相關的實體或其部分抽象為緊湊的幾何元素。這種方法提供了更廣泛的應用性，簡化了跟踪，並通過將這些元素作為視覺提示來促進基於約束的視覺編程。實驗表明，與三個模擬器和一個現實世界環境中的基準相比，CaM在嚴重干擾下實現了28.7％的更高成功率，並將執行時間減少了31.8％。此外，CaM可以與開環控制策略集成，形成閉環系統，從而實現在動態環境中的混亂場景中進行長視距任務。

English

Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments.

代碼即監控：面向反應式和主動式機器人故障檢測的限制感知視覺編程

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

摘要

Support