Fine-tune Multi-modal LLaVA Vision and Language Models