Currently Qwen2.5-VL has a bug when using Flash-attention2. You need to disable to train the model. I made a quick fix monkey-patching code for it. The script requires a dataset formatted according to ...