Self Attention in Transformer Neural Networks (with Code!)