CognitiveDrone: Модель VLA и оценочный эталон для решения когнитивных задач и рассуждений в реальном времени на БПЛА

Аннотация

В данной статье представлен CognitiveDrone — новая модель Vision-Language-Action (VLA), разработанная для выполнения сложных задач беспилотных летательных аппаратов (БПЛА), требующих продвинутых когнитивных способностей. Модель обучена на наборе данных, включающем более 8000 смоделированных траекторий полёта по трём ключевым категориям: распознавание людей, понимание символов и логическое рассуждение. Она генерирует 4D-команды в реальном времени на основе визуальных данных от первого лица и текстовых инструкций. Для повышения производительности в сложных сценариях мы предлагаем CognitiveDrone-R1, который интегрирует дополнительный модуль рассуждений Vision-Language Model (VLM) для упрощения задач перед высокочастотным управлением. Экспериментальные оценки с использованием нашего открытого бенчмарка CognitiveDroneBench показывают, что, хотя модель, ориентированная на гонки (RaceVLA), достигает общего уровня успешности 31,3%, базовая модель CognitiveDrone демонстрирует результат 59,6%, а CognitiveDrone-R1 достигает уровня успешности 77,2%. Эти результаты свидетельствуют об улучшении до 30% в критически важных когнитивных задачах, подчеркивая эффективность внедрения продвинутых возможностей рассуждения в системы управления БПЛА. Наш вклад включает разработку передовой модели VLA для управления БПЛА и создание первого специализированного бенчмарка для оценки когнитивных задач в операциях с дронами. Полный репозиторий доступен по адресу cognitivedrone.github.io.

English

This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA) model tailored for complex Unmanned Aerial Vehicles (UAVs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories-Human Recognition, Symbol Understanding, and Reasoning-the model generates real-time 4D action commands based on first-person visual inputs and textual instructions. To further enhance performance in intricate scenarios, we propose CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM) reasoning module to simplify task directives prior to high-frequency control. Experimental evaluations using our open-source benchmark, CognitiveDroneBench, reveal that while a racing-oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UAV control systems. Our contributions include the development of a state-of-the-art VLA model for UAV control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations. The complete repository is available at cognitivedrone.github.io

CognitiveDrone: Модель VLA и оценочный эталон для решения когнитивных задач и рассуждений в реальном времени на БПЛА

CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

Аннотация

Summary

Support