danielhanchen commited on
Commit
65cc206
·
verified ·
1 Parent(s): 95ef3f5

Add files using upload-large-folder tool

Browse files
README.md CHANGED
@@ -1,59 +1,20 @@
1
  ---
2
- base_model: Qwen/Qwen2.5-VL-7B-Instruct
 
 
3
  language:
4
  - en
5
- library_name: transformers
6
  pipeline_tag: image-text-to-text
7
- license: apache-2.0
8
  tags:
9
  - multimodal
10
- - qwen
11
- - qwen2
12
  - unsloth
13
- - transformers
14
- - vision
15
  ---
16
- <div>
17
- <p style="margin-bottom: 0;margin-top:0;">
18
- <em>View all of our uploaded models <a href="https://docs.unsloth.ai/get-started/all-our-models">here</em>
19
- </p>
20
- <div style="display: flex; gap: 5px; align-items: center;margin-top:0; ">
21
- <a href="https://github.com/unslothai/unsloth/">
22
- <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
23
- </a>
24
- <a href="https://discord.gg/unsloth">
25
- <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
26
- </a>
27
- <a href="https://docs.unsloth.ai/">
28
- <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
29
- </a>
30
- </div>
31
- <h1 style="margin-top: 0rem;">Finetune LLMs 2-5x faster with 70% less memory via Unsloth</h2>
32
- </div>
33
- We have a free Google Colab Tesla T4 notebook for Qwen2-VL (7B) here: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb
34
-
35
- ## ✨ Finetune for Free
36
-
37
- All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
38
-
39
- | Unsloth supports | Free Notebooks | Performance | Memory use |
40
- |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
41
- | **Llama-3.2 (3B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2.4x faster | 58% less |
42
- | **Llama-3.2 (11B vision)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 60% less |
43
- | **Qwen2 VL (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb) | 1.8x faster | 60% less |
44
- | **Qwen2.5 (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 60% less |
45
- | **Llama-3.1 (8B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb) | 2.4x faster | 58% less |
46
- | **Phi-3.5 (mini)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_3.5_Mini-Conversational.ipynb) | 2x faster | 50% less |
47
- | **Gemma 2 (9B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma2_(9B)-Alpaca.ipynb) | 2.4x faster | 58% less |
48
- | **Mistral (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb) | 2.2x faster | 62% less |
49
-
50
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="200"/>](https://docs.unsloth.ai)
51
-
52
- - This [Llama 3.2 conversational notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates.
53
- - This [text completion notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_(7B)-Text_Completion.ipynb) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
54
- - \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
55
-
56
- # Qwen2.5-VL
57
 
58
  ## Introduction
59
 
@@ -567,4 +528,3 @@ If you find our work helpful, feel free to give us a cite.
567
  year={2023}
568
  }
569
  ```
570
-
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-VL-7B-Instruct
4
+ license: apache-2.0
5
  language:
6
  - en
 
7
  pipeline_tag: image-text-to-text
 
8
  tags:
9
  - multimodal
 
 
10
  - unsloth
11
+ library_name: transformers
 
12
  ---
13
+
14
+ # Qwen2.5-VL-7B-Instruct
15
+ <a href="https://chat.qwenlm.ai/" target="_blank" style="margin: 2px;">
16
+ <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
17
+ </a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Introduction
20
 
 
528
  year={2023}
529
  }
530
  ```
 
chat_template.jinja ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
2
+ You are a helpful assistant.<|im_end|>
3
+ {% endif %}<|im_start|>{{ message['role'] }}
4
+ {% if message['content'] is string %}{{ message['content'] }}<|im_end|>
5
+ {% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
6
+ {% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
7
+ {% endif %}
config.json CHANGED
@@ -1,5 +1,4 @@
1
  {
2
- "_name_or_path": "unsloth/Qwen2.5-VL-7B-Instruct",
3
  "architectures": [
4
  "Qwen2_5_VLForConditionalGeneration"
5
  ],
@@ -10,7 +9,7 @@
10
  "image_token_id": 151655,
11
  "initializer_range": 0.02,
12
  "intermediate_size": 18944,
13
- "max_position_embeddings": 32768,
14
  "max_window_layers": 28,
15
  "model_type": "qwen2_5_vl",
16
  "num_attention_heads": 28,
@@ -49,20 +48,76 @@
49
  },
50
  "rope_theta": 1000000.0,
51
  "sliding_window": 32768,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  "tie_word_embeddings": false,
53
  "torch_dtype": "bfloat16",
54
- "transformers_version": "4.49.0",
55
  "unsloth_fixed": true,
56
  "use_cache": true,
57
  "use_sliding_window": false,
58
  "video_token_id": 151656,
59
  "vision_config": {
 
 
 
 
 
 
 
 
60
  "hidden_size": 1280,
 
61
  "in_chans": 3,
 
 
62
  "model_type": "qwen2_5_vl",
 
 
 
 
63
  "spatial_patch_size": 14,
 
64
  "tokens_per_second": 2,
65
- "torch_dtype": "bfloat16"
 
66
  },
67
  "vision_end_token_id": 151653,
68
  "vision_start_token_id": 151652,
 
1
  {
 
2
  "architectures": [
3
  "Qwen2_5_VLForConditionalGeneration"
4
  ],
 
9
  "image_token_id": 151655,
10
  "initializer_range": 0.02,
11
  "intermediate_size": 18944,
12
+ "max_position_embeddings": 128000,
13
  "max_window_layers": 28,
14
  "model_type": "qwen2_5_vl",
15
  "num_attention_heads": 28,
 
48
  },
49
  "rope_theta": 1000000.0,
50
  "sliding_window": 32768,
51
+ "text_config": {
52
+ "architectures": [
53
+ "Qwen2_5_VLForConditionalGeneration"
54
+ ],
55
+ "attention_dropout": 0.0,
56
+ "bos_token_id": 151643,
57
+ "eos_token_id": 151645,
58
+ "hidden_act": "silu",
59
+ "hidden_size": 3584,
60
+ "image_token_id": null,
61
+ "initializer_range": 0.02,
62
+ "intermediate_size": 18944,
63
+ "max_position_embeddings": 128000,
64
+ "max_window_layers": 28,
65
+ "model_type": "qwen2_5_vl_text",
66
+ "num_attention_heads": 28,
67
+ "num_hidden_layers": 28,
68
+ "num_key_value_heads": 4,
69
+ "rms_norm_eps": 1e-06,
70
+ "rope_scaling": {
71
+ "mrope_section": [
72
+ 16,
73
+ 24,
74
+ 24
75
+ ],
76
+ "rope_type": "default",
77
+ "type": "default"
78
+ },
79
+ "rope_theta": 1000000.0,
80
+ "sliding_window": 32768,
81
+ "torch_dtype": "bfloat16",
82
+ "use_cache": true,
83
+ "use_sliding_window": false,
84
+ "video_token_id": null,
85
+ "vision_end_token_id": 151653,
86
+ "vision_start_token_id": 151652,
87
+ "vision_token_id": 151654,
88
+ "vocab_size": 152064
89
+ },
90
  "tie_word_embeddings": false,
91
  "torch_dtype": "bfloat16",
92
+ "transformers_version": "4.52.0.dev0",
93
  "unsloth_fixed": true,
94
  "use_cache": true,
95
  "use_sliding_window": false,
96
  "video_token_id": 151656,
97
  "vision_config": {
98
+ "depth": 32,
99
+ "fullatt_block_indexes": [
100
+ 7,
101
+ 15,
102
+ 23,
103
+ 31
104
+ ],
105
+ "hidden_act": "silu",
106
  "hidden_size": 1280,
107
+ "in_channels": 3,
108
  "in_chans": 3,
109
+ "initializer_range": 0.02,
110
+ "intermediate_size": 3420,
111
  "model_type": "qwen2_5_vl",
112
+ "num_heads": 16,
113
+ "out_hidden_size": 3584,
114
+ "patch_size": 14,
115
+ "spatial_merge_size": 2,
116
  "spatial_patch_size": 14,
117
+ "temporal_patch_size": 2,
118
  "tokens_per_second": 2,
119
+ "torch_dtype": "bfloat16",
120
+ "window_size": 112
121
  },
122
  "vision_end_token_id": 151653,
123
  "vision_start_token_id": 151652,
generation_config.json CHANGED
@@ -5,11 +5,9 @@
5
  151645,
6
  151643
7
  ],
8
- "max_length": 32768,
9
  "pad_token_id": 151654,
10
  "repetition_penalty": 1.05,
11
- "temperature": 0.1,
12
- "top_k": 1,
13
- "top_p": 0.001,
14
- "transformers_version": "4.49.0"
15
  }
 
5
  151645,
6
  151643
7
  ],
8
+ "max_length": 128000,
9
  "pad_token_id": 151654,
10
  "repetition_penalty": 1.05,
11
+ "temperature": 1e-06,
12
+ "transformers_version": "4.52.0.dev0"
 
 
13
  }
tokenizer_config.json CHANGED
@@ -195,16 +195,16 @@
195
  "<|video_pad|>"
196
  ],
197
  "bos_token": null,
198
- "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
  "clean_up_tokenization_spaces": false,
200
  "eos_token": "<|im_end|>",
201
  "errors": "replace",
202
  "extra_special_tokens": {},
203
- "model_max_length": 32768,
204
  "pad_token": "<|vision_pad|>",
205
  "padding_side": "left",
206
  "processor_class": "Qwen2_5_VLProcessor",
207
  "split_special_tokens": false,
208
  "tokenizer_class": "Qwen2Tokenizer",
209
- "unk_token": null
210
- }
 
 
195
  "<|video_pad|>"
196
  ],
197
  "bos_token": null,
 
198
  "clean_up_tokenization_spaces": false,
199
  "eos_token": "<|im_end|>",
200
  "errors": "replace",
201
  "extra_special_tokens": {},
202
+ "model_max_length": 128000,
203
  "pad_token": "<|vision_pad|>",
204
  "padding_side": "left",
205
  "processor_class": "Qwen2_5_VLProcessor",
206
  "split_special_tokens": false,
207
  "tokenizer_class": "Qwen2Tokenizer",
208
+ "unk_token": null,
209
+ "chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
210
+ }