RT-2 combines the power of high-capacity vision-language models (VLMs) with real-world robotic data, enabling robots to understand and execute complex instructions with ease. This cutting-edge model has been fine-tuned to interpret natural language commands, process visual cues, and perform a wide range of tasks - from object recognition to long-horizon planning.
Ещё видео!