Embodied Language Grounding With 3D Visual Feature Representations