- The LLaVA-PT is from LLaVA.
- The Hybird-FT is from SViT, LVIS, LRV, MIMIC-IT.
- The LLaVA-FT is from LLaVA.
- Download the training annotations. You can download from Baidu Disk, Google Disk, Peking University Disk or Hugging Face
We also provide the processed data as follows. The link is to BaiDu Disk.
| Data group | Usage | Link |
|---|---|---|
| LLaVA-PT | Stage 1 | LLaVA 1.5-558k |
| Hybird-FT | Stage 2 | SViT-157k, LVIS-220k, LRV-331k, MIMIC-IT-256k |
| LLaVA-FT | Stage 3 | LLaVA 1.5-mix-665k |
For those who can not easily access to BaiDu Disk, you can download data from Hugging Face.
After downloading all of them, organize the data as follows in IMAGE_FOLDER.
IMAGE_FOLDER
├── llava_image
├── llava_image_tune
├── lvis_tune
├── lrv_tune
├── svit_tune
└── mimicit_tune
└── LASpecify your IMAGE_FOLDER and JSON_FOLDER according to the data preparation.
For training on 384 resolution, we use google/siglip-so400m-patch14-384 as image_tower. Notably, if you pass the --image_tower google/siglip-so400m-patch14-384, you should upgrade the version of transformers to 4.37.0.
Qwen
- Stage 1 pretraining script: pretrain.sh.
- Stage 2 tuning script: finetune.sh.
- Stage 3 moe-tuning script: finetune_moe.sh.
Phi2
- Stage 1 pretraining script: pretrain.sh.
- Stage 2 tuning script: finetune.sh.
- Stage 3 moe-tuning script: finetune_moe.sh.
StableLM
- Stage 1 pretraining script: pretrain.sh.
- Stage 2 tuning script: finetune.sh.
- Stage 3 moe-tuning script: finetune_moe.sh.
OpenChat
- Stage 1 pretraining script: pretrain.sh.
- Stage 2 tuning script: finetune.sh.
- Stage 3 moe-tuning script: finetune_moe.sh.
Comeing soon...