Computer Vision Transformers
Human vision has evolved over hundred million years. Human language is an innovation of much more recent times. Nonetheless, a deep learning tool called transformers, originally intended for human language processing, is also overtaking computer vision by storm. In this talk I will highlight the limitations of blindly using a tool intended for language to solve vision talks and present several inductive biases that empower transformers for computer vision challenges like 2D object segmentation, 3D object detection and video activity detection.
Ещё видео!