认知无人机:面向无人机实时认知任务解决与推理的VLA模型及评估基准
CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs
March 3, 2025
作者: Artem Lykov, Valerii Serpiva, Muhammad Haris Khan, Oleg Sautenkov, Artyom Myshlyaev, Grik Tadevosyan, Yasheerah Yaqoot, Dzmitry Tsetserukou
cs.AI
摘要
本文介绍了CognitiveDrone,一种专为需要高级认知能力的复杂无人机(UAV)任务设计的新型视觉-语言-动作(VLA)模型。该模型基于包含超过8,000条模拟飞行轨迹的数据集进行训练,涵盖三大关键类别——人类识别、符号理解与推理,能够根据第一人称视觉输入和文本指令生成实时的四维动作命令。为进一步提升在复杂场景中的表现,我们提出了CognitiveDrone-R1,它集成了一个额外的视觉-语言模型(VLM)推理模块,在高频控制前简化任务指令。通过使用我们开源的基准测试CognitiveDroneBench进行实验评估,结果显示,尽管以竞速为导向的模型(RaceVLA)总体成功率为31.3%,基础版CognitiveDrone模型达到59.6%,而CognitiveDrone-R1则实现了77.2%的成功率。这些结果表明,在关键认知任务上实现了高达30%的性能提升,凸显了将高级推理能力融入无人机控制系统的有效性。我们的贡献包括开发了用于无人机控制的最先进VLA模型,并引入了首个专门用于评估无人机操作中认知任务的基准测试。完整资源库可在cognitivedrone.github.io获取。
English
This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA)
model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand
advanced cognitive abilities. Trained on a dataset comprising over 8,000
simulated flight trajectories across three key categories-Human Recognition,
Symbol Understanding, and Reasoning-the model generates real-time 4D action
commands based on first-person visual inputs and textual instructions. To
further enhance performance in intricate scenarios, we propose
CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM)
reasoning module to simplify task directives prior to high-frequency control.
Experimental evaluations using our open-source benchmark, CognitiveDroneBench,
reveal that while a racing-oriented model (RaceVLA) achieves an overall success
rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and
CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate
improvements of up to 30% in critical cognitive tasks, underscoring the
effectiveness of incorporating advanced reasoning capabilities into UAV control
systems. Our contributions include the development of a state-of-the-art VLA
model for UAV control and the introduction of the first dedicated benchmark for
assessing cognitive tasks in drone operations. The complete repository is
available at cognitivedrone.github.ioSummary
AI-Generated Summary