CLIP: Contrastive Language-Image Pre-training
In this video, I describe the CLIP model published by OpenAI. CLIP is based on Natural Language Supervision for pre-training. Natural Language Supervision is not a new, in fact there are two approaches for this, one approach tries to predict the exact caption for each image, whereas the other approach is based on contrastive loss, where instead of predicting the exact caption, they try to increase the similarity of correct pairs.
Ещё видео!