Daemontatox danielhanchen commited on
Commit
0d5a46e
·
verified ·
0 Parent(s):

Duplicate from unsloth/aya-vision-8b

Browse files

Co-authored-by: Daniel Han-Chen <danielhanchen@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: CohereForAI/aya-vision-8b
3
+ inference: false
4
+ library_name: transformers
5
+ language:
6
+ - en
7
+ - fr
8
+ - de
9
+ - es
10
+ - it
11
+ - pt
12
+ - ja
13
+ - ko
14
+ - zh
15
+ - ar
16
+ - el
17
+ - fa
18
+ - pl
19
+ - id
20
+ - cs
21
+ - he
22
+ - hi
23
+ - nl
24
+ - ro
25
+ - ru
26
+ - tr
27
+ - uk
28
+ - vi
29
+ license: cc-by-nc-4.0
30
+ extra_gated_prompt: >-
31
+ By submitting this form, you agree to the [License
32
+ Agreement](https://cohere.com/c4ai-cc-by-nc-license) and acknowledge that the
33
+ information you provide will be collected, used, and shared in accordance with
34
+ Cohere’s [Privacy Policy]( https://cohere.com/privacy). You’ll receive email
35
+ updates about C4AI and Cohere research, events, products and services. You can
36
+ unsubscribe at any time.
37
+ extra_gated_fields:
38
+ Name: text
39
+ Affiliation: text
40
+ Country: country
41
+ I agree to use this model for non-commercial use ONLY: checkbox
42
+ pipeline_tag: image-text-to-text
43
+ ---
44
+
45
+ # Model Card for Aya Vision 8B
46
+
47
+ <img src="aya-vision-8B.png" width="650" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
48
+
49
+ **C4AI Aya Vision 8B** is an open weights research release of an 8-billion parameter model with advanced capabilities optimized for a variety of vision-language use cases, including OCR, captioning, visual reasoning, summarization, question answering, code, and more.
50
+ It is a multilingual model trained to excel in 23 languages in vision and language.
51
+
52
+ This model card corresponds to the 8-billion version of the Aya Vision model. We also released a 32-billion version which you can find [here](https://huggingface.co/CohereForAI/aya-vision-32B).
53
+
54
+ - Developed by: [Cohere For AI](https://cohere.for.ai/)
55
+ - Point of Contact: Cohere For AI: [cohere.for.ai](https://cohere.for.ai/)
56
+ - License: [CC-BY-NC](https://cohere.com/c4ai-cc-by-nc-license), requires also adhering to [C4AI's Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy)
57
+ - Model: c4ai-aya-vision-8b
58
+ - Model Size: 8 billion parameters
59
+ - Context length: 16K
60
+
61
+ ## Try it: Aya Vision in Action
62
+
63
+ Before downloading the weights, you can try Aya Vision chat in the [Cohere playground](https://dashboard.cohere.com/playground/chat) or our dedicated [Hugging Face Space](https://huggingface.co/spaces/CohereForAI/aya_expanse) for interactive exploration.
64
+
65
+ ## WhatsApp Integration
66
+
67
+ You can also talk to Aya Vision through the popular messaging service WhatsApp. Use this [link](https://wa.me/14313028498) to open a WhatsApp chatbox with Aya Vision.
68
+
69
+ If you don’t have WhatsApp downloaded on your machine you might need to do that, or, if you have it on your phone, you can follow the on-screen instructions to link your phone and WhatsApp Web.
70
+ By the end, you should see a text window which you can use to chat with the model.
71
+ More details about our WhatsApp integration are available [here](https://docs.cohere.com/v2/docs/aya#aya-expanse-integration-with-whatsapp).
72
+
73
+ ## Example Notebook
74
+
75
+ You can also check out the following [notebook](https://colab.research.google.com/github/cohere-ai/cohere-developer-experience/blob/main/notebooks/guides/aya_vision_intro.ipynb) to understand how to use Aya Vision for different use cases.
76
+
77
+ ## How to Use Aya Vision
78
+
79
+ Please install `transformers` from the source repository that includes the necessary changes for this model:
80
+
81
+ ```python
82
+ # pip install 'git+https://github.com/huggingface/transformers.git@v4.49.0-AyaVision'
83
+ from transformers import AutoProcessor, AutoModelForImageTextToText
84
+ import torch
85
+
86
+ model_id = "CohereForAI/aya-vision-8b"
87
+
88
+ processor = AutoProcessor.from_pretrained(model_id)
89
+ model = AutoModelForImageTextToText.from_pretrained(
90
+ model_id, device_map="auto", torch_dtype=torch.float16
91
+ )
92
+
93
+ # Format message with the aya-vision chat template
94
+ messages = [
95
+ {"role": "user",
96
+ "content": [
97
+ {"type": "image", "url": "https://pbs.twimg.com/media/Fx7YvfQWYAIp6rZ?format=jpg&name=medium"},
98
+ {"type": "text", "text": "चित्र में लिखा पाठ क्या कहता है?"},
99
+ ]},
100
+ ]
101
+
102
+ inputs = processor.apply_chat_template(
103
+ messages, padding=True, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt"
104
+ ).to(model.device)
105
+
106
+ gen_tokens = model.generate(
107
+ **inputs,
108
+ max_new_tokens=300,
109
+ do_sample=True,
110
+ temperature=0.3,
111
+ )
112
+
113
+ print(processor.tokenizer.decode(gen_tokens[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
114
+ ```
115
+
116
+
117
+ You can also use the model directly using transformers `pipeline` abstraction:
118
+
119
+ ```python
120
+ from transformers import pipeline
121
+
122
+ pipe = pipeline(model="CohereForAI/aya-vision-8b", task="image-text-to-text", device_map="auto")
123
+
124
+ # Format message with the aya-vision chat template
125
+ messages = [
126
+ {"role": "user",
127
+ "content": [
128
+ {"type": "image", "url": "https://media.istockphoto.com/id/458012057/photo/istanbul-turkey.jpg?s=612x612&w=0&k=20&c=qogAOVvkpfUyqLUMr_XJQyq-HkACXyYUSZbKhBlPrxo="},
129
+ {"type": "text", "text": "Bu resimde hangi anıt gösterilmektedir?"},
130
+ ]},
131
+ ]
132
+ outputs = pipe(text=messages, max_new_tokens=300, return_full_text=False)
133
+
134
+ print(outputs)
135
+ ```
136
+
137
+ ## Model Details
138
+
139
+ **Input:** Model accepts input text and images.
140
+
141
+ **Output:** Model generates text.
142
+
143
+ **Model Architecture:** This is a vision-language model that uses a multilingual language model based on [C4AI Command R7B](https://huggingface.co/CohereForAI/c4ai-command-r7b-12-2024) and further post-trained with the [Aya Expanse recipe](https://arxiv.org/abs/2412.04261), paired with [SigLIP2-patch14-384](https://huggingface.co/google/siglip2-so400m-patch14-384) vision encoder through a multimodal adapter for vision-language understanding.
144
+
145
+ **Image Processing:** We use **169 visual tokens** to encode an image tile with a resolution of **364x364 pixels**. Input images of arbitrary sizes are mapped to the nearest supported resolution based on the aspect ratio. Aya Vision uses up to 12 input tiles and a thumbnail (resized to 364x364) (2197 image tokens).
146
+
147
+ **Languages covered:** The model has been trained on 23 languages: English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, Chinese (Simplified and Traditional), Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian.
148
+
149
+ **Context length**: Aya Vision 8B supports a context length of 16K.
150
+
151
+ For more details about how the model was trained, check out [our blogpost](https://huggingface.co/blog/aya-vision).
152
+
153
+
154
+ ## Evaluation
155
+
156
+ We evaluated Aya Vision 8B against [Pangea 7B](https://huggingface.co/neulab/Pangea-7B), [Llama-3.2 11B Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision), [Molmo-D 7B](https://huggingface.co/allenai/Molmo-7B-D-0924), [Qwen2.5-VL 7B](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), [Pixtral 12B](https://huggingface.co/mistralai/Pixtral-12B-2409), and [Gemini Flash 1.5 8B](https://developers.googleblog.com/en/gemini-15-flash-8b-is-now-generally-available-for-use/) using [Aya Vision Benchmark](https://huggingface.co/datasets/CohereForAI/AyaVisionBench) and [m-WildVision](https://huggingface.co/datasets/CohereForAI/m-WildVision).
157
+ Win-rates were determined using claude-3-7-sonnet-20250219 as a judge, based on the superior judge performance compared to other models.
158
+
159
+ We also evaluated Aya Vision 8B’s performance for text-only input against the same models using [m-ArenaHard](https://huggingface.co/datasets/CohereForAI/m-ArenaHard), a challenging open-ended generation evaluation, measured using win-rates using gpt-4o-2024-11-20 as a judge.
160
+
161
+ <!-- <img src="Aya_Vision_8B_Combined_Win_Rates.png" width="650" style="margin-left:'auto' margin-right:'auto' display:'block'"/> -->
162
+ <img src="AyaVision8BWinRates(AyaVisionBench).png" width="650" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
163
+ <img src="AyaVision8BWinRates(m-WildVision).png" width="650" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
164
+ <img src="Aya_Vision_8BvsPangea(AyaVisionBench).png" width="650" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
165
+ <img src="EfficiencyvsPerformance.png" width="650" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
166
+
167
+
168
+ ### Model Card Contact
169
+
170
+ For errors or additional questions about details in this model card, contact info@for.ai.
171
+
172
+ ### Terms of Use
173
+
174
+ We hope that the release of this model will make community-based research efforts more accessible by releasing the weights of a highly performant 8 billion parameter Vision-Language Model to researchers all over the world.
175
+
176
+ This model is governed by a [CC-BY-NC](https://cohere.com/c4ai-cc-by-nc-license) License with an acceptable use addendum, and also requires adhering to [C4AI's Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{{ bos_token }}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># System Preamble\nYou are in contextual safety mode. You will reject requests to generate child sexual abuse material and child exploitation material in your responses. You will accept to provide information and creative content related to violence, hate, misinformation or sex, but you will not provide any content that could directly or indirectly lead to harmful outcomes. When analyzing images, carefully describe and interpret their content while avoiding any promotion of harm, misinformation, or bias.\n\nYou are Aya Vision, a vision-language model built by Cohere for AI. You have been trained on data in English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Modern Standard Arabic, Mandarin, Russian, Indonesian, Turkish, Dutch, Polish, Persian, Vietnamese, Czech, Hindi, Ukrainian, Romanian, Greek and Hebrew. You are capable of interpreting images, including describing them, answering questions about their contents, extracting textual information, and analyzing visual context. Your responses must maintain the highest standards of quality, accuracy, and safety.\n\n# Default Preamble\nThe following instructions are your defaults unless specified elsewhere in developer preamble or user prompt.\n- Your name is Aya Vision.\n- You are a large language model built by Cohere for AI.\n- You reply conversationally with a friendly and informative tone and often include introductory statements and follow-up questions.\n- If the input is ambiguous, ask clarifying follow-up questions.\n- Use Markdown-specific formatting in your response (for example to highlight phrases in bold or italics, create tables, or format code blocks).\n- Use LaTeX to generate mathematical notation for complex equations.\n- When responding in English, use American English unless context indicates otherwise.\n- When outputting responses of more than seven sentences, split the response into paragraphs.\n- Prefer the active voice.\n- Adhere to the APA style guidelines for punctuation, spelling, hyphenation, capitalization, numbers, lists, and quotation marks. Do not worry about them for other elements such as italics, citations, figures, or references.\n- Use gender-neutral pronouns for unspecified persons.\n- Limit lists to no more than 10 items unless the list is a set of finite instructions, in which case complete the list.\n- Use the third person when asked to write a summary.\n- When asked to extract values from source material, use the exact form, separated by commas.\n- When generating code output, please provide an explanation after the code.\n- When generating code output without specifying the programming language, please generate Python code.\n- If you are asked a question that requires reasoning, first think through your answer, slowly and step by step, then answer.\n<|END_OF_TURN_TOKEN|>\n{%- for message in messages -%}\n <|START_OF_TURN_TOKEN|>{{ message.role | replace(\"user\", \"<|USER_TOKEN|>\") | replace(\"assistant\", \"<|CHATBOT_TOKEN|><|START_RESPONSE|>\") | replace(\"system\", \"<|SYSTEM_TOKEN|>\") }}\n {%- if message.content is defined -%}\n {%- if message.content is string -%}\n{{ message.content }}\n {%- else -%}\n {%- for item in message.content | selectattr('type', 'equalto', 'image') -%}\n<image>\n {%- endfor -%}\n {%- for item in message.content | selectattr('type', 'equalto', 'text') -%}\n{{ item.text }}\n {%- endfor -%}\n {%- endif -%}\n {%- elif message.message is defined -%}\n {%- if message.message is string -%}\n{{ message.message }}\n {%- else -%}\n {%- for item in message.message | selectattr('type', 'equalto', 'image') -%}\n<image>\n {%- endfor -%}\n {%- for item in message.message | selectattr('type', 'equalto', 'text') -%}\n{{ item.text }}\n {%- endfor -%}\n {%- endif -%}\n {%- endif -%}\n {%- if message.role == \"assistant\" -%}\n<|END_RESPONSE|>\n {%- endif -%}\n<|END_OF_TURN_TOKEN|>\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>\n{%- endif -%}\n"
3
+ }
config.json ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "adapter_layer_norm_eps": 1e-06,
3
+ "alignment_activation_fn": "swiglu",
4
+ "alignment_intermediate_size": 28672,
5
+ "architectures": [
6
+ "AyaVisionForConditionalGeneration"
7
+ ],
8
+ "bos_token_id": 5,
9
+ "downsample_factor": 2,
10
+ "eos_token_id": 255001,
11
+ "image_token_index": 255036,
12
+ "max_splits_per_img": 12,
13
+ "model_type": "aya_vision",
14
+ "pad_token_id": 0,
15
+ "projector_hidden_act": "gelu",
16
+ "text_config": {
17
+ "attention_bias": false,
18
+ "attention_dropout": 0.0,
19
+ "cache_implementation": "hybrid",
20
+ "head_dim": 128,
21
+ "hidden_act": "silu",
22
+ "hidden_size": 4096,
23
+ "initializer_range": 0.02,
24
+ "intermediate_size": 14336,
25
+ "layer_norm_eps": 1e-05,
26
+ "logit_scale": 0.25,
27
+ "max_position_embeddings": 8192,
28
+ "model_type": "cohere2",
29
+ "num_attention_heads": 32,
30
+ "num_hidden_layers": 32,
31
+ "num_key_value_heads": 8,
32
+ "rope_scaling": null,
33
+ "rope_theta": 50000,
34
+ "sliding_window": 4096,
35
+ "sliding_window_pattern": 4,
36
+ "torch_dtype": "bfloat16",
37
+ "use_cache": true,
38
+ "use_qk_norm": false,
39
+ "vocab_size": 256000
40
+ },
41
+ "torch_dtype": "bfloat16",
42
+ "transformers_version": "4.50.0.dev0",
43
+ "unsloth_fixed": true,
44
+ "vision_config": {
45
+ "attention_dropout": 0.0,
46
+ "hidden_act": "gelu_pytorch_tanh",
47
+ "hidden_size": 1152,
48
+ "image_size": 364,
49
+ "intermediate_size": 4304,
50
+ "layer_norm_eps": 1e-06,
51
+ "model_type": "siglip_vision_model",
52
+ "num_attention_heads": 16,
53
+ "num_channels": 3,
54
+ "num_hidden_layers": 27,
55
+ "patch_size": 14,
56
+ "torch_dtype": "bfloat16",
57
+ "vision_use_head": false
58
+ },
59
+ "vision_feature_layer": -1,
60
+ "vision_feature_select_strategy": "full"
61
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 5,
4
+ "cache_implementation": "hybrid",
5
+ "eos_token_id": 255001,
6
+ "pad_token_id": 0,
7
+ "transformers_version": "4.50.0.dev0"
8
+ }
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78e3682a39ce7517c0a33a29af95ef125081fe0e7be1157122e4bed1df3e531f
3
+ size 4932249664
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c64a234efd018087325144ac98fbb16aef6b5f6ca3f373e10bd64d697320605
3
+ size 4999720952
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff94c2d618d5ad89ac57531b473fc81ba40a4464259ddd333f6b0c5a6356f62e
3
+ size 4915826080
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f24d25814cf9e560b614a47a574f3477b6da50181c2ef28e31ae700569538730
3
+ size 2415982184
model.safetensors.index.json ADDED
@@ -0,0 +1,708 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 17263684064
4
+ },
5
+ "weight_map": {
6
+ "language_model.model.embed_tokens.weight": "model-00001-of-00004.safetensors",
7
+ "language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
8
+ "language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
9
+ "language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
10
+ "language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
11
+ "language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
12
+ "language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
13
+ "language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
14
+ "language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
15
+ "language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
16
+ "language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
17
+ "language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
18
+ "language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
19
+ "language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
20
+ "language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
21
+ "language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
22
+ "language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
23
+ "language_model.model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
24
+ "language_model.model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
25
+ "language_model.model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
26
+ "language_model.model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
27
+ "language_model.model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
28
+ "language_model.model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
29
+ "language_model.model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
30
+ "language_model.model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
31
+ "language_model.model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
32
+ "language_model.model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
33
+ "language_model.model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
34
+ "language_model.model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
35
+ "language_model.model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
36
+ "language_model.model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
37
+ "language_model.model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
38
+ "language_model.model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
39
+ "language_model.model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
40
+ "language_model.model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
41
+ "language_model.model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
42
+ "language_model.model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
43
+ "language_model.model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
44
+ "language_model.model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
45
+ "language_model.model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
46
+ "language_model.model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
47
+ "language_model.model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
48
+ "language_model.model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
49
+ "language_model.model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
50
+ "language_model.model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
51
+ "language_model.model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
52
+ "language_model.model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
53
+ "language_model.model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
54
+ "language_model.model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
55
+ "language_model.model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
56
+ "language_model.model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
57
+ "language_model.model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
58
+ "language_model.model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
59
+ "language_model.model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
60
+ "language_model.model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
61
+ "language_model.model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
62
+ "language_model.model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
63
+ "language_model.model.layers.15.input_layernorm.weight": "model-00003-of-00004.safetensors",
64
+ "language_model.model.layers.15.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
65
+ "language_model.model.layers.15.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
66
+ "language_model.model.layers.15.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
67
+ "language_model.model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
68
+ "language_model.model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
69
+ "language_model.model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
70
+ "language_model.model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
71
+ "language_model.model.layers.16.input_layernorm.weight": "model-00003-of-00004.safetensors",
72
+ "language_model.model.layers.16.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
73
+ "language_model.model.layers.16.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
74
+ "language_model.model.layers.16.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
75
+ "language_model.model.layers.16.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
76
+ "language_model.model.layers.16.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
77
+ "language_model.model.layers.16.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
78
+ "language_model.model.layers.16.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
79
+ "language_model.model.layers.17.input_layernorm.weight": "model-00003-of-00004.safetensors",
80
+ "language_model.model.layers.17.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
81
+ "language_model.model.layers.17.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
82
+ "language_model.model.layers.17.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
83
+ "language_model.model.layers.17.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
84
+ "language_model.model.layers.17.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
85
+ "language_model.model.layers.17.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
86
+ "language_model.model.layers.17.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
87
+ "language_model.model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
88
+ "language_model.model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
89
+ "language_model.model.layers.18.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
90
+ "language_model.model.layers.18.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
91
+ "language_model.model.layers.18.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
92
+ "language_model.model.layers.18.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
93
+ "language_model.model.layers.18.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
94
+ "language_model.model.layers.18.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
95
+ "language_model.model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
96
+ "language_model.model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
97
+ "language_model.model.layers.19.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
98
+ "language_model.model.layers.19.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
99
+ "language_model.model.layers.19.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
100
+ "language_model.model.layers.19.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
101
+ "language_model.model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
102
+ "language_model.model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
103
+ "language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
104
+ "language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
105
+ "language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
106
+ "language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
107
+ "language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
108
+ "language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
109
+ "language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
110
+ "language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
111
+ "language_model.model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
112
+ "language_model.model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
113
+ "language_model.model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
114
+ "language_model.model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
115
+ "language_model.model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
116
+ "language_model.model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
117
+ "language_model.model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
118
+ "language_model.model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
119
+ "language_model.model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
120
+ "language_model.model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
121
+ "language_model.model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
122
+ "language_model.model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
123
+ "language_model.model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
124
+ "language_model.model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
125
+ "language_model.model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
126
+ "language_model.model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
127
+ "language_model.model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
128
+ "language_model.model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
129
+ "language_model.model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
130
+ "language_model.model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
131
+ "language_model.model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
132
+ "language_model.model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
133
+ "language_model.model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
134
+ "language_model.model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
135
+ "language_model.model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
136
+ "language_model.model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
137
+ "language_model.model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
138
+ "language_model.model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
139
+ "language_model.model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
140
+ "language_model.model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
141
+ "language_model.model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
142
+ "language_model.model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
143
+ "language_model.model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
144
+ "language_model.model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
145
+ "language_model.model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
146
+ "language_model.model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
147
+ "language_model.model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
148
+ "language_model.model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
149
+ "language_model.model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
150
+ "language_model.model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
151
+ "language_model.model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
152
+ "language_model.model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
153
+ "language_model.model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
154
+ "language_model.model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
155
+ "language_model.model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
156
+ "language_model.model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
157
+ "language_model.model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
158
+ "language_model.model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
159
+ "language_model.model.layers.26.input_layernorm.weight": "model-00004-of-00004.safetensors",
160
+ "language_model.model.layers.26.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
161
+ "language_model.model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
162
+ "language_model.model.layers.26.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
163
+ "language_model.model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
164
+ "language_model.model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
165
+ "language_model.model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
166
+ "language_model.model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
167
+ "language_model.model.layers.27.input_layernorm.weight": "model-00004-of-00004.safetensors",
168
+ "language_model.model.layers.27.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
169
+ "language_model.model.layers.27.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
170
+ "language_model.model.layers.27.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
171
+ "language_model.model.layers.27.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
172
+ "language_model.model.layers.27.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
173
+ "language_model.model.layers.27.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
174
+ "language_model.model.layers.27.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
175
+ "language_model.model.layers.28.input_layernorm.weight": "model-00004-of-00004.safetensors",
176
+ "language_model.model.layers.28.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
177
+ "language_model.model.layers.28.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
178
+ "language_model.model.layers.28.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
179
+ "language_model.model.layers.28.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
180
+ "language_model.model.layers.28.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
181
+ "language_model.model.layers.28.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
182
+ "language_model.model.layers.28.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
183
+ "language_model.model.layers.29.input_layernorm.weight": "model-00004-of-00004.safetensors",
184
+ "language_model.model.layers.29.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
185
+ "language_model.model.layers.29.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
186
+ "language_model.model.layers.29.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
187
+ "language_model.model.layers.29.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
188
+ "language_model.model.layers.29.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
189
+ "language_model.model.layers.29.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
190
+ "language_model.model.layers.29.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
191
+ "language_model.model.layers.3.input_layernorm.weight": "model-00002-of-00004.safetensors",
192
+ "language_model.model.layers.3.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
193
+ "language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
194
+ "language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
195
+ "language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
196
+ "language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
197
+ "language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
198
+ "language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
199
+ "language_model.model.layers.30.input_layernorm.weight": "model-00004-of-00004.safetensors",
200
+ "language_model.model.layers.30.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
201
+ "language_model.model.layers.30.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
202
+ "language_model.model.layers.30.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
203
+ "language_model.model.layers.30.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
204
+ "language_model.model.layers.30.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
205
+ "language_model.model.layers.30.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
206
+ "language_model.model.layers.30.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
207
+ "language_model.model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
208
+ "language_model.model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
209
+ "language_model.model.layers.31.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
210
+ "language_model.model.layers.31.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
211
+ "language_model.model.layers.31.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
212
+ "language_model.model.layers.31.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
213
+ "language_model.model.layers.31.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
214
+ "language_model.model.layers.31.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
215
+ "language_model.model.layers.4.input_layernorm.weight": "model-00002-of-00004.safetensors",
216
+ "language_model.model.layers.4.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
217
+ "language_model.model.layers.4.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
218
+ "language_model.model.layers.4.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
219
+ "language_model.model.layers.4.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
220
+ "language_model.model.layers.4.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
221
+ "language_model.model.layers.4.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
222
+ "language_model.model.layers.4.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
223
+ "language_model.model.layers.5.input_layernorm.weight": "model-00002-of-00004.safetensors",
224
+ "language_model.model.layers.5.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
225
+ "language_model.model.layers.5.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
226
+ "language_model.model.layers.5.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
227
+ "language_model.model.layers.5.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
228
+ "language_model.model.layers.5.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
229
+ "language_model.model.layers.5.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
230
+ "language_model.model.layers.5.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
231
+ "language_model.model.layers.6.input_layernorm.weight": "model-00002-of-00004.safetensors",
232
+ "language_model.model.layers.6.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
233
+ "language_model.model.layers.6.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
234
+ "language_model.model.layers.6.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
235
+ "language_model.model.layers.6.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
236
+ "language_model.model.layers.6.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
237
+ "language_model.model.layers.6.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
238
+ "language_model.model.layers.6.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
239
+ "language_model.model.layers.7.input_layernorm.weight": "model-00002-of-00004.safetensors",
240
+ "language_model.model.layers.7.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
241
+ "language_model.model.layers.7.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
242
+ "language_model.model.layers.7.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
243
+ "language_model.model.layers.7.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
244
+ "language_model.model.layers.7.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
245
+ "language_model.model.layers.7.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
246
+ "language_model.model.layers.7.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
247
+ "language_model.model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
248
+ "language_model.model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
249
+ "language_model.model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
250
+ "language_model.model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
251
+ "language_model.model.layers.8.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
252
+ "language_model.model.layers.8.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
253
+ "language_model.model.layers.8.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
254
+ "language_model.model.layers.8.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
255
+ "language_model.model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
256
+ "language_model.model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
257
+ "language_model.model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
258
+ "language_model.model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
259
+ "language_model.model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
260
+ "language_model.model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
261
+ "language_model.model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
262
+ "language_model.model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
263
+ "language_model.model.norm.weight": "model-00004-of-00004.safetensors",
264
+ "multi_modal_projector.layernorm.bias": "model-00001-of-00004.safetensors",
265
+ "multi_modal_projector.layernorm.weight": "model-00001-of-00004.safetensors",
266
+ "multi_modal_projector.linear_1.bias": "model-00001-of-00004.safetensors",
267
+ "multi_modal_projector.linear_1.weight": "model-00001-of-00004.safetensors",
268
+ "multi_modal_projector.linear_2.bias": "model-00001-of-00004.safetensors",
269
+ "multi_modal_projector.linear_2.weight": "model-00001-of-00004.safetensors",
270
+ "vision_tower.vision_model.embeddings.patch_embedding.bias": "model-00001-of-00004.safetensors",
271
+ "vision_tower.vision_model.embeddings.patch_embedding.weight": "model-00001-of-00004.safetensors",
272
+ "vision_tower.vision_model.embeddings.position_embedding.weight": "model-00001-of-00004.safetensors",
273
+ "vision_tower.vision_model.encoder.layers.0.layer_norm1.bias": "model-00001-of-00004.safetensors",
274
+ "vision_tower.vision_model.encoder.layers.0.layer_norm1.weight": "model-00001-of-00004.safetensors",
275
+ "vision_tower.vision_model.encoder.layers.0.layer_norm2.bias": "model-00001-of-00004.safetensors",
276
+ "vision_tower.vision_model.encoder.layers.0.layer_norm2.weight": "model-00001-of-00004.safetensors",
277
+ "vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias": "model-00001-of-00004.safetensors",
278
+ "vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight": "model-00001-of-00004.safetensors",
279
+ "vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias": "model-00001-of-00004.safetensors",
280
+ "vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight": "model-00001-of-00004.safetensors",
281
+ "vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
282
+ "vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
283
+ "vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
284
+ "vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
285
+ "vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
286
+ "vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
287
+ "vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
288
+ "vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
289
+ "vision_tower.vision_model.encoder.layers.1.layer_norm1.bias": "model-00001-of-00004.safetensors",
290
+ "vision_tower.vision_model.encoder.layers.1.layer_norm1.weight": "model-00001-of-00004.safetensors",
291
+ "vision_tower.vision_model.encoder.layers.1.layer_norm2.bias": "model-00001-of-00004.safetensors",
292
+ "vision_tower.vision_model.encoder.layers.1.layer_norm2.weight": "model-00001-of-00004.safetensors",
293
+ "vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias": "model-00001-of-00004.safetensors",
294
+ "vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight": "model-00001-of-00004.safetensors",
295
+ "vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias": "model-00001-of-00004.safetensors",
296
+ "vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight": "model-00001-of-00004.safetensors",
297
+ "vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
298
+ "vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
299
+ "vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
300
+ "vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
301
+ "vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
302
+ "vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
303
+ "vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
304
+ "vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
305
+ "vision_tower.vision_model.encoder.layers.10.layer_norm1.bias": "model-00001-of-00004.safetensors",
306
+ "vision_tower.vision_model.encoder.layers.10.layer_norm1.weight": "model-00001-of-00004.safetensors",
307
+ "vision_tower.vision_model.encoder.layers.10.layer_norm2.bias": "model-00001-of-00004.safetensors",
308
+ "vision_tower.vision_model.encoder.layers.10.layer_norm2.weight": "model-00001-of-00004.safetensors",
309
+ "vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias": "model-00001-of-00004.safetensors",
310
+ "vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight": "model-00001-of-00004.safetensors",
311
+ "vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias": "model-00001-of-00004.safetensors",
312
+ "vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight": "model-00001-of-00004.safetensors",
313
+ "vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
314
+ "vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
315
+ "vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
316
+ "vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
317
+ "vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
318
+ "vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
319
+ "vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
320
+ "vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
321
+ "vision_tower.vision_model.encoder.layers.11.layer_norm1.bias": "model-00001-of-00004.safetensors",
322
+ "vision_tower.vision_model.encoder.layers.11.layer_norm1.weight": "model-00001-of-00004.safetensors",
323
+ "vision_tower.vision_model.encoder.layers.11.layer_norm2.bias": "model-00001-of-00004.safetensors",
324
+ "vision_tower.vision_model.encoder.layers.11.layer_norm2.weight": "model-00001-of-00004.safetensors",
325
+ "vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias": "model-00001-of-00004.safetensors",
326
+ "vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight": "model-00001-of-00004.safetensors",
327
+ "vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias": "model-00001-of-00004.safetensors",
328
+ "vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight": "model-00001-of-00004.safetensors",
329
+ "vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
330
+ "vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
331
+ "vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
332
+ "vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
333
+ "vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
334
+ "vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
335
+ "vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
336
+ "vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
337
+ "vision_tower.vision_model.encoder.layers.12.layer_norm1.bias": "model-00001-of-00004.safetensors",
338
+ "vision_tower.vision_model.encoder.layers.12.layer_norm1.weight": "model-00001-of-00004.safetensors",
339
+ "vision_tower.vision_model.encoder.layers.12.layer_norm2.bias": "model-00001-of-00004.safetensors",
340
+ "vision_tower.vision_model.encoder.layers.12.layer_norm2.weight": "model-00001-of-00004.safetensors",
341
+ "vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias": "model-00001-of-00004.safetensors",
342
+ "vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight": "model-00001-of-00004.safetensors",
343
+ "vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias": "model-00001-of-00004.safetensors",
344
+ "vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight": "model-00001-of-00004.safetensors",
345
+ "vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
346
+ "vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
347
+ "vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
348
+ "vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
349
+ "vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
350
+ "vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
351
+ "vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
352
+ "vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
353
+ "vision_tower.vision_model.encoder.layers.13.layer_norm1.bias": "model-00001-of-00004.safetensors",
354
+ "vision_tower.vision_model.encoder.layers.13.layer_norm1.weight": "model-00001-of-00004.safetensors",
355
+ "vision_tower.vision_model.encoder.layers.13.layer_norm2.bias": "model-00001-of-00004.safetensors",
356
+ "vision_tower.vision_model.encoder.layers.13.layer_norm2.weight": "model-00001-of-00004.safetensors",
357
+ "vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias": "model-00001-of-00004.safetensors",
358
+ "vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight": "model-00001-of-00004.safetensors",
359
+ "vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias": "model-00001-of-00004.safetensors",
360
+ "vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight": "model-00001-of-00004.safetensors",
361
+ "vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
362
+ "vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
363
+ "vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
364
+ "vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
365
+ "vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
366
+ "vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
367
+ "vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
368
+ "vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
369
+ "vision_tower.vision_model.encoder.layers.14.layer_norm1.bias": "model-00001-of-00004.safetensors",
370
+ "vision_tower.vision_model.encoder.layers.14.layer_norm1.weight": "model-00001-of-00004.safetensors",
371
+ "vision_tower.vision_model.encoder.layers.14.layer_norm2.bias": "model-00001-of-00004.safetensors",
372
+ "vision_tower.vision_model.encoder.layers.14.layer_norm2.weight": "model-00001-of-00004.safetensors",
373
+ "vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias": "model-00001-of-00004.safetensors",
374
+ "vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight": "model-00001-of-00004.safetensors",
375
+ "vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias": "model-00001-of-00004.safetensors",
376
+ "vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight": "model-00001-of-00004.safetensors",
377
+ "vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
378
+ "vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
379
+ "vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
380
+ "vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
381
+ "vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
382
+ "vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
383
+ "vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
384
+ "vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
385
+ "vision_tower.vision_model.encoder.layers.15.layer_norm1.bias": "model-00001-of-00004.safetensors",
386
+ "vision_tower.vision_model.encoder.layers.15.layer_norm1.weight": "model-00001-of-00004.safetensors",
387
+ "vision_tower.vision_model.encoder.layers.15.layer_norm2.bias": "model-00001-of-00004.safetensors",
388
+ "vision_tower.vision_model.encoder.layers.15.layer_norm2.weight": "model-00001-of-00004.safetensors",
389
+ "vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias": "model-00001-of-00004.safetensors",
390
+ "vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight": "model-00001-of-00004.safetensors",
391
+ "vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias": "model-00001-of-00004.safetensors",
392
+ "vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight": "model-00001-of-00004.safetensors",
393
+ "vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
394
+ "vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
395
+ "vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
396
+ "vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
397
+ "vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
398
+ "vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
399
+ "vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
400
+ "vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
401
+ "vision_tower.vision_model.encoder.layers.16.layer_norm1.bias": "model-00001-of-00004.safetensors",
402
+ "vision_tower.vision_model.encoder.layers.16.layer_norm1.weight": "model-00001-of-00004.safetensors",
403
+ "vision_tower.vision_model.encoder.layers.16.layer_norm2.bias": "model-00001-of-00004.safetensors",
404
+ "vision_tower.vision_model.encoder.layers.16.layer_norm2.weight": "model-00001-of-00004.safetensors",
405
+ "vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias": "model-00001-of-00004.safetensors",
406
+ "vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight": "model-00001-of-00004.safetensors",
407
+ "vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias": "model-00001-of-00004.safetensors",
408
+ "vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight": "model-00001-of-00004.safetensors",
409
+ "vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
410
+ "vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
411
+ "vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
412
+ "vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
413
+ "vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
414
+ "vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
415
+ "vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
416
+ "vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
417
+ "vision_tower.vision_model.encoder.layers.17.layer_norm1.bias": "model-00001-of-00004.safetensors",
418
+ "vision_tower.vision_model.encoder.layers.17.layer_norm1.weight": "model-00001-of-00004.safetensors",
419
+ "vision_tower.vision_model.encoder.layers.17.layer_norm2.bias": "model-00001-of-00004.safetensors",
420
+ "vision_tower.vision_model.encoder.layers.17.layer_norm2.weight": "model-00001-of-00004.safetensors",
421
+ "vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias": "model-00001-of-00004.safetensors",
422
+ "vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight": "model-00001-of-00004.safetensors",
423
+ "vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias": "model-00001-of-00004.safetensors",
424
+ "vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight": "model-00001-of-00004.safetensors",
425
+ "vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
426
+ "vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
427
+ "vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
428
+ "vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
429
+ "vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
430
+ "vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
431
+ "vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
432
+ "vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
433
+ "vision_tower.vision_model.encoder.layers.18.layer_norm1.bias": "model-00001-of-00004.safetensors",
434
+ "vision_tower.vision_model.encoder.layers.18.layer_norm1.weight": "model-00001-of-00004.safetensors",
435
+ "vision_tower.vision_model.encoder.layers.18.layer_norm2.bias": "model-00001-of-00004.safetensors",
436
+ "vision_tower.vision_model.encoder.layers.18.layer_norm2.weight": "model-00001-of-00004.safetensors",
437
+ "vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias": "model-00001-of-00004.safetensors",
438
+ "vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight": "model-00001-of-00004.safetensors",
439
+ "vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias": "model-00001-of-00004.safetensors",
440
+ "vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight": "model-00001-of-00004.safetensors",
441
+ "vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
442
+ "vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
443
+ "vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
444
+ "vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
445
+ "vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
446
+ "vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
447
+ "vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
448
+ "vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
449
+ "vision_tower.vision_model.encoder.layers.19.layer_norm1.bias": "model-00001-of-00004.safetensors",
450
+ "vision_tower.vision_model.encoder.layers.19.layer_norm1.weight": "model-00001-of-00004.safetensors",
451
+ "vision_tower.vision_model.encoder.layers.19.layer_norm2.bias": "model-00001-of-00004.safetensors",
452
+ "vision_tower.vision_model.encoder.layers.19.layer_norm2.weight": "model-00001-of-00004.safetensors",
453
+ "vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias": "model-00001-of-00004.safetensors",
454
+ "vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight": "model-00001-of-00004.safetensors",
455
+ "vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias": "model-00001-of-00004.safetensors",
456
+ "vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight": "model-00001-of-00004.safetensors",
457
+ "vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
458
+ "vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
459
+ "vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
460
+ "vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
461
+ "vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
462
+ "vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
463
+ "vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
464
+ "vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
465
+ "vision_tower.vision_model.encoder.layers.2.layer_norm1.bias": "model-00001-of-00004.safetensors",
466
+ "vision_tower.vision_model.encoder.layers.2.layer_norm1.weight": "model-00001-of-00004.safetensors",
467
+ "vision_tower.vision_model.encoder.layers.2.layer_norm2.bias": "model-00001-of-00004.safetensors",
468
+ "vision_tower.vision_model.encoder.layers.2.layer_norm2.weight": "model-00001-of-00004.safetensors",
469
+ "vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias": "model-00001-of-00004.safetensors",
470
+ "vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight": "model-00001-of-00004.safetensors",
471
+ "vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias": "model-00001-of-00004.safetensors",
472
+ "vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight": "model-00001-of-00004.safetensors",
473
+ "vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
474
+ "vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
475
+ "vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
476
+ "vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
477
+ "vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
478
+ "vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
479
+ "vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
480
+ "vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
481
+ "vision_tower.vision_model.encoder.layers.20.layer_norm1.bias": "model-00001-of-00004.safetensors",
482
+ "vision_tower.vision_model.encoder.layers.20.layer_norm1.weight": "model-00001-of-00004.safetensors",
483
+ "vision_tower.vision_model.encoder.layers.20.layer_norm2.bias": "model-00001-of-00004.safetensors",
484
+ "vision_tower.vision_model.encoder.layers.20.layer_norm2.weight": "model-00001-of-00004.safetensors",
485
+ "vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias": "model-00001-of-00004.safetensors",
486
+ "vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight": "model-00001-of-00004.safetensors",
487
+ "vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias": "model-00001-of-00004.safetensors",
488
+ "vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight": "model-00001-of-00004.safetensors",
489
+ "vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
490
+ "vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
491
+ "vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
492
+ "vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
493
+ "vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
494
+ "vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
495
+ "vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
496
+ "vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
497
+ "vision_tower.vision_model.encoder.layers.21.layer_norm1.bias": "model-00001-of-00004.safetensors",
498
+ "vision_tower.vision_model.encoder.layers.21.layer_norm1.weight": "model-00001-of-00004.safetensors",
499
+ "vision_tower.vision_model.encoder.layers.21.layer_norm2.bias": "model-00001-of-00004.safetensors",
500
+ "vision_tower.vision_model.encoder.layers.21.layer_norm2.weight": "model-00001-of-00004.safetensors",
501
+ "vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias": "model-00001-of-00004.safetensors",
502
+ "vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight": "model-00001-of-00004.safetensors",
503
+ "vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias": "model-00001-of-00004.safetensors",
504
+ "vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight": "model-00001-of-00004.safetensors",
505
+ "vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
506
+ "vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
507
+ "vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
508
+ "vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
509
+ "vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
510
+ "vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
511
+ "vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
512
+ "vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
513
+ "vision_tower.vision_model.encoder.layers.22.layer_norm1.bias": "model-00001-of-00004.safetensors",
514
+ "vision_tower.vision_model.encoder.layers.22.layer_norm1.weight": "model-00001-of-00004.safetensors",
515
+ "vision_tower.vision_model.encoder.layers.22.layer_norm2.bias": "model-00001-of-00004.safetensors",
516
+ "vision_tower.vision_model.encoder.layers.22.layer_norm2.weight": "model-00001-of-00004.safetensors",
517
+ "vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias": "model-00001-of-00004.safetensors",
518
+ "vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight": "model-00001-of-00004.safetensors",
519
+ "vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias": "model-00001-of-00004.safetensors",
520
+ "vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight": "model-00001-of-00004.safetensors",
521
+ "vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
522
+ "vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
523
+ "vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
524
+ "vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
525
+ "vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
526
+ "vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
527
+ "vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
528
+ "vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
529
+ "vision_tower.vision_model.encoder.layers.23.layer_norm1.bias": "model-00001-of-00004.safetensors",
530
+ "vision_tower.vision_model.encoder.layers.23.layer_norm1.weight": "model-00001-of-00004.safetensors",
531
+ "vision_tower.vision_model.encoder.layers.23.layer_norm2.bias": "model-00001-of-00004.safetensors",
532
+ "vision_tower.vision_model.encoder.layers.23.layer_norm2.weight": "model-00001-of-00004.safetensors",
533
+ "vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias": "model-00001-of-00004.safetensors",
534
+ "vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight": "model-00001-of-00004.safetensors",
535
+ "vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias": "model-00001-of-00004.safetensors",
536
+ "vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight": "model-00001-of-00004.safetensors",
537
+ "vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
538
+ "vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
539
+ "vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
540
+ "vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
541
+ "vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
542
+ "vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
543
+ "vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
544
+ "vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
545
+ "vision_tower.vision_model.encoder.layers.24.layer_norm1.bias": "model-00001-of-00004.safetensors",
546
+ "vision_tower.vision_model.encoder.layers.24.layer_norm1.weight": "model-00001-of-00004.safetensors",
547
+ "vision_tower.vision_model.encoder.layers.24.layer_norm2.bias": "model-00001-of-00004.safetensors",
548
+ "vision_tower.vision_model.encoder.layers.24.layer_norm2.weight": "model-00001-of-00004.safetensors",
549
+ "vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias": "model-00001-of-00004.safetensors",
550
+ "vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight": "model-00001-of-00004.safetensors",
551
+ "vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias": "model-00001-of-00004.safetensors",
552
+ "vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight": "model-00001-of-00004.safetensors",
553
+ "vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
554
+ "vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
555
+ "vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
556
+ "vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
557
+ "vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
558
+ "vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
559
+ "vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
560
+ "vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
561
+ "vision_tower.vision_model.encoder.layers.25.layer_norm1.bias": "model-00001-of-00004.safetensors",
562
+ "vision_tower.vision_model.encoder.layers.25.layer_norm1.weight": "model-00001-of-00004.safetensors",
563
+ "vision_tower.vision_model.encoder.layers.25.layer_norm2.bias": "model-00001-of-00004.safetensors",
564
+ "vision_tower.vision_model.encoder.layers.25.layer_norm2.weight": "model-00001-of-00004.safetensors",
565
+ "vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias": "model-00001-of-00004.safetensors",
566
+ "vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight": "model-00001-of-00004.safetensors",
567
+ "vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias": "model-00001-of-00004.safetensors",
568
+ "vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight": "model-00001-of-00004.safetensors",
569
+ "vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
570
+ "vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
571
+ "vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
572
+ "vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
573
+ "vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
574
+ "vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
575
+ "vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
576
+ "vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
577
+ "vision_tower.vision_model.encoder.layers.26.layer_norm1.bias": "model-00001-of-00004.safetensors",
578
+ "vision_tower.vision_model.encoder.layers.26.layer_norm1.weight": "model-00001-of-00004.safetensors",
579
+ "vision_tower.vision_model.encoder.layers.26.layer_norm2.bias": "model-00001-of-00004.safetensors",
580
+ "vision_tower.vision_model.encoder.layers.26.layer_norm2.weight": "model-00001-of-00004.safetensors",
581
+ "vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias": "model-00001-of-00004.safetensors",
582
+ "vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight": "model-00001-of-00004.safetensors",
583
+ "vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias": "model-00001-of-00004.safetensors",
584
+ "vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight": "model-00001-of-00004.safetensors",
585
+ "vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
586
+ "vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
587
+ "vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
588
+ "vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
589
+ "vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
590
+ "vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
591
+ "vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
592
+ "vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
593
+ "vision_tower.vision_model.encoder.layers.3.layer_norm1.bias": "model-00001-of-00004.safetensors",
594
+ "vision_tower.vision_model.encoder.layers.3.layer_norm1.weight": "model-00001-of-00004.safetensors",
595
+ "vision_tower.vision_model.encoder.layers.3.layer_norm2.bias": "model-00001-of-00004.safetensors",
596
+ "vision_tower.vision_model.encoder.layers.3.layer_norm2.weight": "model-00001-of-00004.safetensors",
597
+ "vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias": "model-00001-of-00004.safetensors",
598
+ "vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight": "model-00001-of-00004.safetensors",
599
+ "vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias": "model-00001-of-00004.safetensors",
600
+ "vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight": "model-00001-of-00004.safetensors",
601
+ "vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
602
+ "vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
603
+ "vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
604
+ "vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
605
+ "vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
606
+ "vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
607
+ "vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
608
+ "vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
609
+ "vision_tower.vision_model.encoder.layers.4.layer_norm1.bias": "model-00001-of-00004.safetensors",
610
+ "vision_tower.vision_model.encoder.layers.4.layer_norm1.weight": "model-00001-of-00004.safetensors",
611
+ "vision_tower.vision_model.encoder.layers.4.layer_norm2.bias": "model-00001-of-00004.safetensors",
612
+ "vision_tower.vision_model.encoder.layers.4.layer_norm2.weight": "model-00001-of-00004.safetensors",
613
+ "vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias": "model-00001-of-00004.safetensors",
614
+ "vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight": "model-00001-of-00004.safetensors",
615
+ "vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias": "model-00001-of-00004.safetensors",
616
+ "vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight": "model-00001-of-00004.safetensors",
617
+ "vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
618
+ "vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
619
+ "vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
620
+ "vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
621
+ "vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
622
+ "vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
623
+ "vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
624
+ "vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
625
+ "vision_tower.vision_model.encoder.layers.5.layer_norm1.bias": "model-00001-of-00004.safetensors",
626
+ "vision_tower.vision_model.encoder.layers.5.layer_norm1.weight": "model-00001-of-00004.safetensors",
627
+ "vision_tower.vision_model.encoder.layers.5.layer_norm2.bias": "model-00001-of-00004.safetensors",
628
+ "vision_tower.vision_model.encoder.layers.5.layer_norm2.weight": "model-00001-of-00004.safetensors",
629
+ "vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias": "model-00001-of-00004.safetensors",
630
+ "vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight": "model-00001-of-00004.safetensors",
631
+ "vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias": "model-00001-of-00004.safetensors",
632
+ "vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight": "model-00001-of-00004.safetensors",
633
+ "vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
634
+ "vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
635
+ "vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
636
+ "vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
637
+ "vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
638
+ "vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
639
+ "vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
640
+ "vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
641
+ "vision_tower.vision_model.encoder.layers.6.layer_norm1.bias": "model-00001-of-00004.safetensors",
642
+ "vision_tower.vision_model.encoder.layers.6.layer_norm1.weight": "model-00001-of-00004.safetensors",
643
+ "vision_tower.vision_model.encoder.layers.6.layer_norm2.bias": "model-00001-of-00004.safetensors",
644
+ "vision_tower.vision_model.encoder.layers.6.layer_norm2.weight": "model-00001-of-00004.safetensors",
645
+ "vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias": "model-00001-of-00004.safetensors",
646
+ "vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight": "model-00001-of-00004.safetensors",
647
+ "vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias": "model-00001-of-00004.safetensors",
648
+ "vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight": "model-00001-of-00004.safetensors",
649
+ "vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
650
+ "vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
651
+ "vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
652
+ "vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
653
+ "vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
654
+ "vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
655
+ "vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
656
+ "vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
657
+ "vision_tower.vision_model.encoder.layers.7.layer_norm1.bias": "model-00001-of-00004.safetensors",
658
+ "vision_tower.vision_model.encoder.layers.7.layer_norm1.weight": "model-00001-of-00004.safetensors",
659
+ "vision_tower.vision_model.encoder.layers.7.layer_norm2.bias": "model-00001-of-00004.safetensors",
660
+ "vision_tower.vision_model.encoder.layers.7.layer_norm2.weight": "model-00001-of-00004.safetensors",
661
+ "vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias": "model-00001-of-00004.safetensors",
662
+ "vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight": "model-00001-of-00004.safetensors",
663
+ "vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias": "model-00001-of-00004.safetensors",
664
+ "vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight": "model-00001-of-00004.safetensors",
665
+ "vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
666
+ "vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
667
+ "vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
668
+ "vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
669
+ "vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
670
+ "vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
671
+ "vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
672
+ "vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
673
+ "vision_tower.vision_model.encoder.layers.8.layer_norm1.bias": "model-00001-of-00004.safetensors",
674
+ "vision_tower.vision_model.encoder.layers.8.layer_norm1.weight": "model-00001-of-00004.safetensors",
675
+ "vision_tower.vision_model.encoder.layers.8.layer_norm2.bias": "model-00001-of-00004.safetensors",
676
+ "vision_tower.vision_model.encoder.layers.8.layer_norm2.weight": "model-00001-of-00004.safetensors",
677
+ "vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias": "model-00001-of-00004.safetensors",
678
+ "vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight": "model-00001-of-00004.safetensors",
679
+ "vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias": "model-00001-of-00004.safetensors",
680
+ "vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight": "model-00001-of-00004.safetensors",
681
+ "vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
682
+ "vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
683
+ "vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
684
+ "vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
685
+ "vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
686
+ "vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
687
+ "vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
688
+ "vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
689
+ "vision_tower.vision_model.encoder.layers.9.layer_norm1.bias": "model-00001-of-00004.safetensors",
690
+ "vision_tower.vision_model.encoder.layers.9.layer_norm1.weight": "model-00001-of-00004.safetensors",
691
+ "vision_tower.vision_model.encoder.layers.9.layer_norm2.bias": "model-00001-of-00004.safetensors",
692
+ "vision_tower.vision_model.encoder.layers.9.layer_norm2.weight": "model-00001-of-00004.safetensors",
693
+ "vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias": "model-00001-of-00004.safetensors",
694
+ "vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight": "model-00001-of-00004.safetensors",
695
+ "vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias": "model-00001-of-00004.safetensors",
696
+ "vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight": "model-00001-of-00004.safetensors",
697
+ "vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
698
+ "vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
699
+ "vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
700
+ "vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
701
+ "vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
702
+ "vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
703
+ "vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
704
+ "vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
705
+ "vision_tower.vision_model.post_layernorm.bias": "model-00001-of-00004.safetensors",
706
+ "vision_tower.vision_model.post_layernorm.weight": "model-00001-of-00004.safetensors"
707
+ }
708
+ }
preprocessor_config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_to_patches": false,
3
+ "do_convert_rgb": true,
4
+ "do_normalize": true,
5
+ "do_rescale": true,
6
+ "do_resize": true,
7
+ "image_mean": [
8
+ 0.5,
9
+ 0.5,
10
+ 0.5
11
+ ],
12
+ "image_processor_type": "GotOcr2ImageProcessor",
13
+ "image_std": [
14
+ 0.5,
15
+ 0.5,
16
+ 0.5
17
+ ],
18
+ "max_patches": 12,
19
+ "min_patches": 1,
20
+ "processor_class": "AyaVisionProcessor",
21
+ "resample": 3,
22
+ "rescale_factor": 0.00392156862745098,
23
+ "size": {
24
+ "height": 364,
25
+ "width": 364
26
+ }
27
+ }
processor_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "end_of_img_token": "<|END_OF_IMG|>",
3
+ "image_token": "<image>",
4
+ "img_line_break_token": "<|IMG_LINE_BREAK|>",
5
+ "img_patch_token": "<|IMG_PATCH|>",
6
+ "img_size": 364,
7
+ "patch_size": 28,
8
+ "processor_class": "AyaVisionProcessor",
9
+ "start_of_img_token": "<|START_OF_IMG|>",
10
+ "tile_global_token": "TILE_GLOBAL",
11
+ "tile_token": "TILE",
12
+ "vision_feature_select_strategy": "full"
13
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<BOS_TOKEN>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|END_OF_TURN_TOKEN|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<PAD>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d896f76efd3c9d3e0541ec0a0396a5669281c5667a1ac1d82001117625720ff4
3
+ size 20125695
tokenizer_config.json ADDED
@@ -0,0 +1,392 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": false,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<PAD>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<UNK>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "<CLS>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "3": {
31
+ "content": "<SEP>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": true
37
+ },
38
+ "4": {
39
+ "content": "<MASK_TOKEN>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": true
45
+ },
46
+ "5": {
47
+ "content": "<BOS_TOKEN>",
48
+ "lstrip": false,
49
+ "normalized": false,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": true
53
+ },
54
+ "6": {
55
+ "content": "<EOS_TOKEN>",
56
+ "lstrip": false,
57
+ "normalized": false,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": true
61
+ },
62
+ "7": {
63
+ "content": "<EOP_TOKEN>",
64
+ "lstrip": false,
65
+ "normalized": false,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": true
69
+ },
70
+ "255000": {
71
+ "content": "<|START_OF_TURN_TOKEN|>",
72
+ "lstrip": false,
73
+ "normalized": false,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": false
77
+ },
78
+ "255001": {
79
+ "content": "<|END_OF_TURN_TOKEN|>",
80
+ "lstrip": false,
81
+ "normalized": false,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": true
85
+ },
86
+ "255002": {
87
+ "content": "<|YES_TOKEN|>",
88
+ "lstrip": false,
89
+ "normalized": false,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": false
93
+ },
94
+ "255003": {
95
+ "content": "<|NO_TOKEN|>",
96
+ "lstrip": false,
97
+ "normalized": false,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": false
101
+ },
102
+ "255004": {
103
+ "content": "<|GOOD_TOKEN|>",
104
+ "lstrip": false,
105
+ "normalized": false,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": false
109
+ },
110
+ "255005": {
111
+ "content": "<|BAD_TOKEN|>",
112
+ "lstrip": false,
113
+ "normalized": false,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": false
117
+ },
118
+ "255006": {
119
+ "content": "<|USER_TOKEN|>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": false
125
+ },
126
+ "255007": {
127
+ "content": "<|CHATBOT_TOKEN|>",
128
+ "lstrip": false,
129
+ "normalized": false,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": false
133
+ },
134
+ "255008": {
135
+ "content": "<|SYSTEM_TOKEN|>",
136
+ "lstrip": false,
137
+ "normalized": false,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": false
141
+ },
142
+ "255009": {
143
+ "content": "<|USER_0_TOKEN|>",
144
+ "lstrip": false,
145
+ "normalized": false,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": false
149
+ },
150
+ "255010": {
151
+ "content": "<|USER_1_TOKEN|>",
152
+ "lstrip": false,
153
+ "normalized": false,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": false
157
+ },
158
+ "255011": {
159
+ "content": "<|USER_2_TOKEN|>",
160
+ "lstrip": false,
161
+ "normalized": false,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": false
165
+ },
166
+ "255012": {
167
+ "content": "<|USER_3_TOKEN|>",
168
+ "lstrip": false,
169
+ "normalized": false,
170
+ "rstrip": false,
171
+ "single_word": false,
172
+ "special": false
173
+ },
174
+ "255013": {
175
+ "content": "<|USER_4_TOKEN|>",
176
+ "lstrip": false,
177
+ "normalized": false,
178
+ "rstrip": false,
179
+ "single_word": false,
180
+ "special": false
181
+ },
182
+ "255014": {
183
+ "content": "<|USER_5_TOKEN|>",
184
+ "lstrip": false,
185
+ "normalized": false,
186
+ "rstrip": false,
187
+ "single_word": false,
188
+ "special": false
189
+ },
190
+ "255015": {
191
+ "content": "<|USER_6_TOKEN|>",
192
+ "lstrip": false,
193
+ "normalized": false,
194
+ "rstrip": false,
195
+ "single_word": false,
196
+ "special": false
197
+ },
198
+ "255016": {
199
+ "content": "<|USER_7_TOKEN|>",
200
+ "lstrip": false,
201
+ "normalized": false,
202
+ "rstrip": false,
203
+ "single_word": false,
204
+ "special": false
205
+ },
206
+ "255017": {
207
+ "content": "<|USER_8_TOKEN|>",
208
+ "lstrip": false,
209
+ "normalized": false,
210
+ "rstrip": false,
211
+ "single_word": false,
212
+ "special": false
213
+ },
214
+ "255018": {
215
+ "content": "<|USER_9_TOKEN|>",
216
+ "lstrip": false,
217
+ "normalized": false,
218
+ "rstrip": false,
219
+ "single_word": false,
220
+ "special": false
221
+ },
222
+ "255019": {
223
+ "content": "<|START_THINKING|>",
224
+ "lstrip": false,
225
+ "normalized": false,
226
+ "rstrip": false,
227
+ "single_word": false,
228
+ "special": false
229
+ },
230
+ "255020": {
231
+ "content": "<|END_THINKING|>",
232
+ "lstrip": false,
233
+ "normalized": false,
234
+ "rstrip": false,
235
+ "single_word": false,
236
+ "special": false
237
+ },
238
+ "255021": {
239
+ "content": "<|START_RESPONSE|>",
240
+ "lstrip": false,
241
+ "normalized": false,
242
+ "rstrip": false,
243
+ "single_word": false,
244
+ "special": true
245
+ },
246
+ "255022": {
247
+ "content": "<|END_RESPONSE|>",
248
+ "lstrip": false,
249
+ "normalized": false,
250
+ "rstrip": false,
251
+ "single_word": false,
252
+ "special": true
253
+ },
254
+ "255023": {
255
+ "content": "<|START_ACTION|>",
256
+ "lstrip": false,
257
+ "normalized": false,
258
+ "rstrip": false,
259
+ "single_word": false,
260
+ "special": false
261
+ },
262
+ "255024": {
263
+ "content": "<|END_ACTION|>",
264
+ "lstrip": false,
265
+ "normalized": false,
266
+ "rstrip": false,
267
+ "single_word": false,
268
+ "special": false
269
+ },
270
+ "255025": {
271
+ "content": "<|START_TOOL_RESULT|>",
272
+ "lstrip": false,
273
+ "normalized": false,
274
+ "rstrip": false,
275
+ "single_word": false,
276
+ "special": false
277
+ },
278
+ "255026": {
279
+ "content": "<|END_TOOL_RESULT|>",
280
+ "lstrip": false,
281
+ "normalized": false,
282
+ "rstrip": false,
283
+ "single_word": false,
284
+ "special": false
285
+ },
286
+ "255027": {
287
+ "content": "<|EXTRA_8_TOKEN|>",
288
+ "lstrip": false,
289
+ "normalized": false,
290
+ "rstrip": false,
291
+ "single_word": false,
292
+ "special": false
293
+ },
294
+ "255028": {
295
+ "content": "<|NEW_FILE|>",
296
+ "lstrip": false,
297
+ "normalized": false,
298
+ "rstrip": false,
299
+ "single_word": false,
300
+ "special": true
301
+ },
302
+ "255029": {
303
+ "content": "<|BEGINNING_OF_PREFIX_FIM_TOKEN|>",
304
+ "lstrip": false,
305
+ "normalized": false,
306
+ "rstrip": false,
307
+ "single_word": false,
308
+ "special": false
309
+ },
310
+ "255030": {
311
+ "content": "<|BEGINNING_OF_MIDDLE_FIM_TOKEN|>",
312
+ "lstrip": false,
313
+ "normalized": false,
314
+ "rstrip": false,
315
+ "single_word": false,
316
+ "special": false
317
+ },
318
+ "255031": {
319
+ "content": "<|BEGINNING_OF_SUFFIX_FIM_TOKEN|>",
320
+ "lstrip": false,
321
+ "normalized": false,
322
+ "rstrip": false,
323
+ "single_word": false,
324
+ "special": false
325
+ },
326
+ "255032": {
327
+ "content": "<|END_OF_MIDDLE_FIM_TOKEN|>",
328
+ "lstrip": false,
329
+ "normalized": false,
330
+ "rstrip": false,
331
+ "single_word": false,
332
+ "special": false
333
+ },
334
+ "255033": {
335
+ "content": "<|START_OF_IMG|>",
336
+ "lstrip": false,
337
+ "normalized": false,
338
+ "rstrip": false,
339
+ "single_word": false,
340
+ "special": false
341
+ },
342
+ "255034": {
343
+ "content": "<|END_OF_IMG|>",
344
+ "lstrip": false,
345
+ "normalized": false,
346
+ "rstrip": false,
347
+ "single_word": false,
348
+ "special": false
349
+ },
350
+ "255035": {
351
+ "content": "<|IMG_LINE_BREAK|>",
352
+ "lstrip": false,
353
+ "normalized": false,
354
+ "rstrip": false,
355
+ "single_word": false,
356
+ "special": false
357
+ },
358
+ "255036": {
359
+ "content": "<|IMG_PATCH|>",
360
+ "lstrip": false,
361
+ "normalized": false,
362
+ "rstrip": false,
363
+ "single_word": false,
364
+ "special": false
365
+ }
366
+ },
367
+ "bos_token": "<BOS_TOKEN>",
368
+ "chat_template": [
369
+ {
370
+ "name": "default",
371
+ "template": "{{ bos_token }}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># System Preamble\nYou are in contextual safety mode. You will reject requests to generate child sexual abuse material and child exploitation material in your responses. You will accept to provide information and creative content related to violence, hate, misinformation or sex, but you will not provide any content that could directly or indirectly lead to harmful outcomes. When analyzing images, carefully describe and interpret their content while avoiding any promotion of harm, misinformation, or bias.\n\nYou are Aya Vision, a vision-language model built by Cohere for AI. You have been trained on data in English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Modern Standard Arabic, Mandarin, Russian, Indonesian, Turkish, Dutch, Polish, Persian, Vietnamese, Czech, Hindi, Ukrainian, Romanian, Greek and Hebrew. You are capable of interpreting images, including describing them, answering questions about their contents, extracting textual information, and analyzing visual context. Your responses must maintain the highest standards of quality, accuracy, and safety.\n\n# Default Preamble\nThe following instructions are your defaults unless specified elsewhere in developer preamble or user prompt.\n- Your name is Aya Vision.\n- You are a large language model built by Cohere for AI.\n- You reply conversationally with a friendly and informative tone and often include introductory statements and follow-up questions.\n- If the input is ambiguous, ask clarifying follow-up questions.\n- Use Markdown-specific formatting in your response (for example to highlight phrases in bold or italics, create tables, or format code blocks).\n- Use LaTeX to generate mathematical notation for complex equations.\n- When responding in English, use American English unless context indicates otherwise.\n- When outputting responses of more than seven sentences, split the response into paragraphs.\n- Prefer the active voice.\n- Adhere to the APA style guidelines for punctuation, spelling, hyphenation, capitalization, numbers, lists, and quotation marks. Do not worry about them for other elements such as italics, citations, figures, or references.\n- Use gender-neutral pronouns for unspecified persons.\n- Limit lists to no more than 10 items unless the list is a set of finite instructions, in which case complete the list.\n- Use the third person when asked to write a summary.\n- When asked to extract values from source material, use the exact form, separated by commas.\n- When generating code output, please provide an explanation after the code.\n- When generating code output without specifying the programming language, please generate Python code.\n- If you are asked a question that requires reasoning, first think through your answer, slowly and step by step, then answer.\n<|END_OF_TURN_TOKEN|>\n{%- for message in messages -%}\n <|START_OF_TURN_TOKEN|>{{ message.role | replace(\"user\", \"<|USER_TOKEN|>\") | replace(\"assistant\", \"<|CHATBOT_TOKEN|><|START_RESPONSE|>\") | replace(\"system\", \"<|SYSTEM_TOKEN|>\") }}\n {%- if message.content is defined -%}\n {%- if message.content is string -%}\n{{ message.content }}\n {%- else -%}\n {%- for item in message.content | selectattr('type', 'equalto', 'image') -%}\n<image>\n {%- endfor -%}\n {%- for item in message.content | selectattr('type', 'equalto', 'text') -%}\n{{ item.text }}\n {%- endfor -%}\n {%- endif -%}\n {%- elif message.message is defined -%}\n {%- if message.message is string -%}\n{{ message.message }}\n {%- else -%}\n {%- for item in message.message | selectattr('type', 'equalto', 'image') -%}\n<image>\n {%- endfor -%}\n {%- for item in message.message | selectattr('type', 'equalto', 'text') -%}\n{{ item.text }}\n {%- endfor -%}\n {%- endif -%}\n {%- endif -%}\n {%- if message.role == \"assistant\" -%}\n<|END_RESPONSE|>\n {%- endif -%}\n<|END_OF_TURN_TOKEN|>\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>\n{%- endif -%}\n"
372
+ }
373
+ ],
374
+ "clean_up_tokenization_spaces": false,
375
+ "eos_token": "<|END_OF_TURN_TOKEN|>",
376
+ "extra_special_tokens": {},
377
+ "legacy": true,
378
+ "max_length": null,
379
+ "merges_file": null,
380
+ "model_max_length": 16384,
381
+ "pad_to_multiple_of": null,
382
+ "pad_token": "<PAD>",
383
+ "pad_token_type_id": 0,
384
+ "padding_side": "left",
385
+ "processor_class": "AyaVisionProcessor",
386
+ "sp_model_kwargs": {},
387
+ "spaces_between_special_tokens": false,
388
+ "tokenizer_class": "CohereTokenizer",
389
+ "unk_token": null,
390
+ "use_default_system_prompt": false,
391
+ "vocab_file": null
392
+ }