Add pipeline tag, license and model checkpoints (#1)
Browse files- Add pipeline tag, license and model checkpoints (3a7e5df32fdef44f9b2fdab723569a20309da1be)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
@@ -1,9 +1,12 @@
|
|
1 |
---
|
|
|
|
|
2 |
tags:
|
3 |
- mae
|
4 |
- crossmae
|
5 |
-
|
6 |
-
|
|
|
7 |
---
|
8 |
|
9 |
## CrossMAE: Rethinking Patch Dependence for Masked Autoencoders
|
@@ -19,3 +22,60 @@ by <a href="https://max-fu.github.io">Letian Fu*</a>, <a href="https://tonylian.
|
|
19 |
This repo has the models for [CrossMAE: Rethinking Patch Dependence for Masked Autoencoders](https://arxiv.org/abs/2401.14391).
|
20 |
|
21 |
Please take a look at the [GitHub repo](https://github.com/TonyLianLong/CrossMAE) to see instructions on pretraining, fine-tuning, and evaluation with these models.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
datasets:
|
3 |
+
- imagenet-1k
|
4 |
tags:
|
5 |
- mae
|
6 |
- crossmae
|
7 |
+
pipeline_tag: image-classification
|
8 |
+
library_name: pytorch
|
9 |
+
license: cc-by-nc-4.0
|
10 |
---
|
11 |
|
12 |
## CrossMAE: Rethinking Patch Dependence for Masked Autoencoders
|
|
|
22 |
This repo has the models for [CrossMAE: Rethinking Patch Dependence for Masked Autoencoders](https://arxiv.org/abs/2401.14391).
|
23 |
|
24 |
Please take a look at the [GitHub repo](https://github.com/TonyLianLong/CrossMAE) to see instructions on pretraining, fine-tuning, and evaluation with these models.
|
25 |
+
|
26 |
+
<table><tbody>
|
27 |
+
<!-- START TABLE -->
|
28 |
+
<!-- TABLE HEADER -->
|
29 |
+
<th valign="bottom"></th>
|
30 |
+
<th valign="bottom">ViT-Small</th>
|
31 |
+
<th valign="bottom">ViT-Base</th>
|
32 |
+
<th valign="bottom">ViT-Base<sub>448</sub></th>
|
33 |
+
<th valign="bottom">ViT-Large</th>
|
34 |
+
<th valign="bottom">ViT-Huge</th>
|
35 |
+
<!-- TABLE BODY -->
|
36 |
+
<tr><td align="left">pretrained checkpoint</td>
|
37 |
+
<td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vits-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vits-pretrain-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
|
38 |
+
<td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitb-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vitb-pretrain-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
|
39 |
+
<td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitb-mr0.75-kmr0.75-dd12-448-400/imagenet-mae-cross-vitb-pretrain-wfm-mr0.75-kmr0.25-dd12-ep400-ui-res-448.pth?download=true'>download</a></td>
|
40 |
+
<td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitl-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vitl-pretrain-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
|
41 |
+
<td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vith-mr0.75-kmr0.25-dd12/imagenet-mae-cross-vith-pretrain-wfm-mr0.75-kmr0.25-dd12-ep800-ui.pth?download=true'>download</a></td>
|
42 |
+
</tr>
|
43 |
+
<tr><td align="left">fine-tuned checkpoint</td>
|
44 |
+
<td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vits-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vits-finetune-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
|
45 |
+
<td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitb-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vitb-finetune-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
|
46 |
+
<td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitb-mr0.75-kmr0.75-dd12-448-400/imagenet-mae-cross-vitb-finetune-wfm-mr0.75-kmr0.25-dd12-ep400-ui-res-448.pth?download=true'>download</a></td>
|
47 |
+
<td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitl-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vitl-finetune-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
|
48 |
+
<td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vith-mr0.75-kmr0.25-dd12/imagenet-mae-cross-vith-finetune-wfm-mr0.75-kmr0.25-dd12-ep800-ui.pth?download=true'>download</a></td>
|
49 |
+
</tr>
|
50 |
+
<tr><td align="left"><b>Reference ImageNet accuracy (ours)</b></td>
|
51 |
+
<td align="center"><b>79.318</b></td>
|
52 |
+
<td align="center"><b>83.722</b></td>
|
53 |
+
<td align="center"><b>84.598</b></td>
|
54 |
+
<td align="center"><b>85.432</b></td>
|
55 |
+
<td align="center"><b>86.256</b></td>
|
56 |
+
</tr>
|
57 |
+
<tr><td align="left">MAE ImageNet accuracy (baseline)</td>
|
58 |
+
<td align="center"></td>
|
59 |
+
<td align="center"></td>
|
60 |
+
<td align="center">84.8</td>
|
61 |
+
<td align="center"></td>
|
62 |
+
<td align="center">85.9</td>
|
63 |
+
</tr>
|
64 |
+
</tbody></table>
|
65 |
+
|
66 |
+
## Citation
|
67 |
+
Please give us a star 🌟 on Github to support us!
|
68 |
+
|
69 |
+
Please cite our work if you find our work inspiring or use our code in your work:
|
70 |
+
```
|
71 |
+
@article{
|
72 |
+
fu2025rethinking,
|
73 |
+
title={Rethinking Patch Dependence for Masked Autoencoders},
|
74 |
+
author={Letian Fu and Long Lian and Renhao Wang and Baifeng Shi and XuDong Wang and Adam Yala and Trevor Darrell and Alexei A Efros and Ken Goldberg},
|
75 |
+
journal={Transactions on Machine Learning Research},
|
76 |
+
issn={2835-8856},
|
77 |
+
year={2025},
|
78 |
+
url={https://openreview.net/forum?id=JT2KMuo2BV},
|
79 |
+
note={}
|
80 |
+
}
|
81 |
+
```
|