CLEA：闭环具身智能体，用于增强动态环境中的任务执行能力

摘要

大型语言模型（LLMs）在通过语义推理对复杂任务进行层次分解方面展现出卓越能力。然而，其在具身系统中的应用面临确保子任务序列可靠执行及实现长期任务一次性成功的挑战。为应对动态环境中的这些局限，我们提出了闭环具身代理（CLEA）——一种创新架构，集成了四个功能解耦的专用开源LLM，用于闭环任务管理。该框架具备两大核心创新：(1) 交互式任务规划器，基于环境记忆动态生成可执行子任务；(2) 多模态执行评判器，采用评估框架对行动可行性进行概率评估，当环境扰动超出预设阈值时触发层次化重规划机制。为验证CLEA的有效性，我们在真实环境中进行了实验，使用两台异构机器人执行物体搜索、操作及搜索-操作一体化任务。在12次任务试验中，CLEA相较于基线模型，成功率提升了67.3%，任务完成率提高了52.8%。这些结果表明，CLEA显著增强了动态环境中任务规划与执行的鲁棒性。

English

Large Language Models (LLMs) exhibit remarkable capabilities in the hierarchical decomposition of complex tasks through semantic reasoning. However, their application in embodied systems faces challenges in ensuring reliable execution of subtask sequences and achieving one-shot success in long-term task completion. To address these limitations in dynamic environments, we propose Closed-Loop Embodied Agent (CLEA) -- a novel architecture incorporating four specialized open-source LLMs with functional decoupling for closed-loop task management. The framework features two core innovations: (1) Interactive task planner that dynamically generates executable subtasks based on the environmental memory, and (2) Multimodal execution critic employing an evaluation framework to conduct a probabilistic assessment of action feasibility, triggering hierarchical re-planning mechanisms when environmental perturbations exceed preset thresholds. To validate CLEA's effectiveness, we conduct experiments in a real environment with manipulable objects, using two heterogeneous robots for object search, manipulation, and search-manipulation integration tasks. Across 12 task trials, CLEA outperforms the baseline model, achieving a 67.3% improvement in success rate and a 52.8% increase in task completion rate. These results demonstrate that CLEA significantly enhances the robustness of task planning and execution in dynamic environments.

CLEA：闭环具身智能体，用于增强动态环境中的任务执行能力

CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

摘要

Summary

Support

Support