Learning Representations by Predicting Bags of Visual Words