longlian nielsr HF Staff commited on
Commit
3268767
·
verified ·
1 Parent(s): eba5ef4

Add pipeline tag, license and model checkpoints (#1)

Browse files

- Add pipeline tag, license and model checkpoints (3a7e5df32fdef44f9b2fdab723569a20309da1be)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +62 -2
README.md CHANGED
@@ -1,9 +1,12 @@
1
  ---
 
 
2
  tags:
3
  - mae
4
  - crossmae
5
- datasets:
6
- - imagenet-1k
 
7
  ---
8
 
9
  ## CrossMAE: Rethinking Patch Dependence for Masked Autoencoders
@@ -19,3 +22,60 @@ by <a href="https://max-fu.github.io">Letian Fu*</a>, <a href="https://tonylian.
19
  This repo has the models for [CrossMAE: Rethinking Patch Dependence for Masked Autoencoders](https://arxiv.org/abs/2401.14391).
20
 
21
  Please take a look at the [GitHub repo](https://github.com/TonyLianLong/CrossMAE) to see instructions on pretraining, fine-tuning, and evaluation with these models.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - imagenet-1k
4
  tags:
5
  - mae
6
  - crossmae
7
+ pipeline_tag: image-classification
8
+ library_name: pytorch
9
+ license: cc-by-nc-4.0
10
  ---
11
 
12
  ## CrossMAE: Rethinking Patch Dependence for Masked Autoencoders
 
22
  This repo has the models for [CrossMAE: Rethinking Patch Dependence for Masked Autoencoders](https://arxiv.org/abs/2401.14391).
23
 
24
  Please take a look at the [GitHub repo](https://github.com/TonyLianLong/CrossMAE) to see instructions on pretraining, fine-tuning, and evaluation with these models.
25
+
26
+ <table><tbody>
27
+ <!-- START TABLE -->
28
+ <!-- TABLE HEADER -->
29
+ <th valign="bottom"></th>
30
+ <th valign="bottom">ViT-Small</th>
31
+ <th valign="bottom">ViT-Base</th>
32
+ <th valign="bottom">ViT-Base<sub>448</sub></th>
33
+ <th valign="bottom">ViT-Large</th>
34
+ <th valign="bottom">ViT-Huge</th>
35
+ <!-- TABLE BODY -->
36
+ <tr><td align="left">pretrained checkpoint</td>
37
+ <td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vits-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vits-pretrain-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
38
+ <td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitb-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vitb-pretrain-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
39
+ <td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitb-mr0.75-kmr0.75-dd12-448-400/imagenet-mae-cross-vitb-pretrain-wfm-mr0.75-kmr0.25-dd12-ep400-ui-res-448.pth?download=true'>download</a></td>
40
+ <td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitl-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vitl-pretrain-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
41
+ <td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vith-mr0.75-kmr0.25-dd12/imagenet-mae-cross-vith-pretrain-wfm-mr0.75-kmr0.25-dd12-ep800-ui.pth?download=true'>download</a></td>
42
+ </tr>
43
+ <tr><td align="left">fine-tuned checkpoint</td>
44
+ <td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vits-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vits-finetune-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
45
+ <td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitb-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vitb-finetune-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
46
+ <td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitb-mr0.75-kmr0.75-dd12-448-400/imagenet-mae-cross-vitb-finetune-wfm-mr0.75-kmr0.25-dd12-ep400-ui-res-448.pth?download=true'>download</a></td>
47
+ <td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vitl-mr0.75-kmr0.75-dd12/imagenet-mae-cross-vitl-finetune-wfm-mr0.75-kmr0.75-dd12-ep800-ui.pth?download=true'>download</a></td>
48
+ <td align="center"><a href='https://huggingface.co/longlian/CrossMAE/resolve/main/vith-mr0.75-kmr0.25-dd12/imagenet-mae-cross-vith-finetune-wfm-mr0.75-kmr0.25-dd12-ep800-ui.pth?download=true'>download</a></td>
49
+ </tr>
50
+ <tr><td align="left"><b>Reference ImageNet accuracy (ours)</b></td>
51
+ <td align="center"><b>79.318</b></td>
52
+ <td align="center"><b>83.722</b></td>
53
+ <td align="center"><b>84.598</b></td>
54
+ <td align="center"><b>85.432</b></td>
55
+ <td align="center"><b>86.256</b></td>
56
+ </tr>
57
+ <tr><td align="left">MAE ImageNet accuracy (baseline)</td>
58
+ <td align="center"></td>
59
+ <td align="center"></td>
60
+ <td align="center">84.8</td>
61
+ <td align="center"></td>
62
+ <td align="center">85.9</td>
63
+ </tr>
64
+ </tbody></table>
65
+
66
+ ## Citation
67
+ Please give us a star 🌟 on Github to support us!
68
+
69
+ Please cite our work if you find our work inspiring or use our code in your work:
70
+ ```
71
+ @article{
72
+ fu2025rethinking,
73
+ title={Rethinking Patch Dependence for Masked Autoencoders},
74
+ author={Letian Fu and Long Lian and Renhao Wang and Baifeng Shi and XuDong Wang and Adam Yala and Trevor Darrell and Alexei A Efros and Ken Goldberg},
75
+ journal={Transactions on Machine Learning Research},
76
+ issn={2835-8856},
77
+ year={2025},
78
+ url={https://openreview.net/forum?id=JT2KMuo2BV},
79
+ note={}
80
+ }
81
+ ```