Hello, authors!
I have some questions about fine-tuning the VLM following the paper.
The sentences "During SFT, the vision tower, LLM, and merger components were all trainable. During DPO training, we exclusively trained the LLM part." in Appendix D are not clear to me. I guess LLM is LoRA-tuned, but vision tower is not sure. is the vision tower also LoRA tuned? or fully fine-tuned?
Additionally, it is also not clear the fine-tuning setting, such as epochs.
Could you please help clarify these?
Hello, authors!
I have some questions about fine-tuning the VLM following the paper.
The sentences "During SFT, the vision tower, LLM, and merger components were all trainable. During DPO training, we exclusively trained the LLM part." in Appendix D are not clear to me. I guess LLM is LoRA-tuned, but vision tower is not sure. is the vision tower also LoRA tuned? or fully fine-tuned?
Additionally, it is also not clear the fine-tuning setting, such as epochs.
Could you please help clarify these?