teowu nielsr HF Staff commited on
Commit
a889d39
·
verified ·
1 Parent(s): f5378d2

Add pipeline tag and library name (#1)

Browse files

- Add pipeline tag and library name (7d2b463cd17417193035947aab72e0f6136c9ace)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +41 -40
README.md CHANGED
@@ -1,41 +1,42 @@
1
- ---
2
- license: mit
3
- ---
4
-
5
-
6
- <div align="center">
7
- <img width="30%" src="figures/logo.png">
8
- </div>
9
-
10
-
11
- ## Introduction
12
-
13
- **MoonViT** is a Native-resolution Vision Encoder, which is initialized from and continually pre-trained on **SigLIP-SO-400M**.
14
- To facilitate the standalone use of MoonViT, we have separated the implementation and weights of MoonViT from [moonshotai/Kimi-VL-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-VL-A3B-Instruct).
15
-
16
- If you are interested in the training process of MoonViT, you are welcome to read Paper [Kimi-VL Technical Report](https://huggingface.co/papers/2504.07491).
17
-
18
- ## Example usage
19
-
20
- ```python
21
- from PIL import Image
22
- from transformers import AutoModel, AutoImageProcessor
23
-
24
- model_path = "moonshotai/MoonViT-SO-400M"
25
- model = AutoModel.from_pretrained(
26
- model_path,
27
- torch_dtype="auto",
28
- device_map="auto",
29
- trust_remote_code=True,
30
- )
31
- processor = AutoImageProcessor.from_pretrained(model_path, trust_remote_code=True)
32
-
33
- image_path = "./figures/demo.png"
34
- image = Image.open(image_path)
35
-
36
- images_processed = processor(image, return_tensors="pt").to(dtype=model.dtype, device=model.device)
37
- image_features: list = model(images_processed.pixel_values, images_processed.image_grid_hws)
38
-
39
- print(f"dtype: {image_features[0].dtype}, shape: {image_features[0].shape}")
40
- # dtype: torch.bfloat16, shape: torch.Size([1092, 4, 1152])
 
41
  ```
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-feature-extraction
4
+ library_name: transformers
5
+ ---
6
+
7
+ <div align="center">
8
+ <img width="30%" src="figures/logo.png">
9
+ </div>
10
+
11
+
12
+ ## Introduction
13
+
14
+ **MoonViT** is a Native-resolution Vision Encoder, which is initialized from and continually pre-trained on **SigLIP-SO-400M**.
15
+ To facilitate the standalone use of MoonViT, we have separated the implementation and weights of MoonViT from [moonshotai/Kimi-VL-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-VL-A3B-Instruct).
16
+
17
+ If you are interested in the training process of MoonViT, you are welcome to read Paper [Kimi-VL Technical Report](https://huggingface.co/papers/2504.07491).
18
+
19
+ ## Example usage
20
+
21
+ ```python
22
+ from PIL import Image
23
+ from transformers import AutoModel, AutoImageProcessor
24
+
25
+ model_path = "moonshotai/MoonViT-SO-400M"
26
+ model = AutoModel.from_pretrained(
27
+ model_path,
28
+ torch_dtype="auto",
29
+ device_map="auto",
30
+ trust_remote_code=True,
31
+ )
32
+ processor = AutoImageProcessor.from_pretrained(model_path, trust_remote_code=True)
33
+
34
+ image_path = "./figures/demo.png"
35
+ image = Image.open(image_path)
36
+
37
+ images_processed = processor(image, return_tensors="pt").to(dtype=model.dtype, device=model.device)
38
+ image_features: list = model(images_processed.pixel_values, images_processed.image_grid_hws)
39
+
40
+ print(f"dtype: {image_features[0].dtype}, shape: {image_features[0].shape}")
41
+ # dtype: torch.bfloat16, shape: torch.Size([1092, 4, 1152])
42
  ```