AlphaSpace: Ermöglichung robotischer Aktionen durch semantische Tokenisierung und symbolische Argumentation

Zusammenfassung

Dieses Papier stellt AlphaSpace vor, eine neuartige Methodik, die entwickelt wurde, um die räumlichen Denkfähigkeiten von großen Sprachmodellen (LLMs) für die Navigation im 3D-Kartesischen Raum zu verbessern. AlphaSpace verwendet eine semantikbasierte Tokenisierungsstrategie, die Höheninformationen durch spezialisierte semantische Tokens kodiert, und integriert hauptsächlich symbolische synthetische Denkdaten. Dieser Ansatz ermöglicht es LLMs, Objekte präzise zu manipulieren, indem sie an spezifischen [x, y, z]-Koordinaten positioniert werden. Experimentelle Ergebnisse zeigen, dass AlphaSpace bestehende Modelle bei Manipulationsunteraufgaben deutlich übertrifft und eine Gesamtgenauigkeit von 66,67 % erreicht, verglichen mit 37,5 % für GPT-4o und 29,17 % für Claude 3.5 Sonnet.

English

This paper presents AlphaSpace, a novel methodology designed to enhance the spatial reasoning capabilities of large language models (LLMs) for 3D Cartesian space navigation. AlphaSpace employs a semantics-based tokenization strategy, encoding height information through specialized semantic tokens, and integrates primarily symbolic synthetic reasoning data. This approach enables LLMs to accurately manipulate objects by positioning them at specific [x, y, z] coordinates. Experimental results demonstrate that AlphaSpace significantly outperforms existing models on manipulation subtasks, achieving a total accuracy of 66.67%, compared to 37.5% for GPT-4o and 29.17% for Claude 3.5 Sonnet.

AlphaSpace: Ermöglichung robotischer Aktionen durch semantische Tokenisierung und symbolische Argumentation

AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning

Zusammenfassung

Summary

Support