Downloading weights without duplicates
Downloading with regular git/lfs makes duplicates in .git/lfs/objects
which are quite huge for 680Gb weights files:
sudo apt-get install git-lfs
git lfs install
# git clone https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
# du -sh DeepSeek-V3-0324
# # 1.3T DeepSeek-V3-0324/
# du -sh DeepSeek-V3-0324/.git/lfs
# # 642G DeepSeek-V3-0324/.git/lfs
How do I download the weights files without any duplication?
Would huggingface_hub.snapshot_download
not produce duplicates / any extra cache? (I'm worried of this cache https://huggingface.co/docs/huggingface_hub/en/guides/manage-cache)
pip install hf_transfer huggingface_hub[hf_transfer]
HF_HUB_ENABLE_HF_TRANSFER=1 python -c 'import huggingface_hub; huggingface_hub.snapshot_download(repo_id="deepseek-ai/DeepSeek-V3-0324",local_dir="deepseek-ai/DeepSeek-V3-0324",allow_patterns=["*.safetensors"])'
just use huggingface-cli download deepseek-ai/DeepSeek-V3-0324 should be fine
Thanks! Maybe adding a note about this directly in the README would be very helpful for novices.
As DeepSeek is one of really big open models, so having a warning that git clone https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
would lead to duplicating the 642Gb would be useful, and a command for fast non-duplicating download would be very helpful.