In Visual Question Answering, given an image and a free-form natural language question about the image (e.g., "What kindof store is this?", "How many people are waiting in the queue?", "Is it safe to cross the street?") the machine's task is to automatically produce a concise, accurate, free-form, natural language answer ("bakery", "5", "Yes").
Ещё видео!