More details in fine-tuning with MotionVid-QA

Hello, authors!

I have some questions about fine-tuning the VLM following the paper. 
The sentences "During SFT, the vision tower, LLM, and merger components were all trainable. During DPO training, we exclusively trained the LLM part." in Appendix D are not clear to me. I guess LLM is LoRA-tuned, but vision tower is not sure. is the vision tower also LoRA tuned? or fully fine-tuned?

Additionally, it is also not clear the fine-tuning setting, such as epochs.


Could you please help clarify these?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More details in fine-tuning with MotionVid-QA #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

More details in fine-tuning with MotionVid-QA #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions