T5 GGUF Incompatible with llama.cpp
The gguf T5 text encoder in this repository seems to use a completely non-standard state dict format. My general recommendation is not to create invalid files like this for models that are already supported by llama.cpp, as it causes problems with the rest of the ecosystem.
HiDream uses the same T5 XXL encoder as Flux, which is fully supported in llama.cpp and has plenty of available quantized weights.
Also, the T5 model from this repository uses Q4_0 for all weights, including important ones which should be kept in q8_0 or F16, which most likely degrades quality.
Is there any reason not to just upload one of the models from chatpig/t5-v1_1-xxl-encoder-fp32-gguf
or link to it?
The new format cannot be used with ComfyUI-GGUF (only accepts llama.cpp compatible text encoders) so it forces people to use your node pack and creates confusion.
The file "llama-q2_k.gguf" is a completely standard file on the other hand, converted with llama.cpp since it has all the metadata. It works in ComfyUI-GGUF.
sure, upload it right away; thanks
Thank you! I'll see if I can add a special case for the CLIP models from here so people won't have to re-download the safetensor variants. They seem to be FP16 without using any quantized parts so I think I can treat them the same way as safetensors for now.
The gguf T5 text encoder in this repository seems to use a completely non-standard state dict format. My general recommendation is not to create invalid files like this for models that are already supported by llama.cpp, as it causes problems with the rest of the ecosystem.
HiDream uses the same T5 XXL encoder as Flux, which is fully supported in llama.cpp and has plenty of available quantized weights.
Also, the T5 model from this repository uses Q4_0 for all weights, including important ones which should be kept in q8_0 or F16, which most likely degrades quality.
Oh, we discuss that issue; it won't degrade the quality since it works exactly like safetensors; llama.cpp hf-to-gguf conversion process will rename all the tensor headers; which makes the reverse convert very difficult; good, you notice that: the new version can be reverse converted back to safetensors with convertor (reverse)
and it still works; but the llama.cpp version doesn't; we understand that all the tensors will be read by torch in comfyui eventually; such separate encoder is no meaning and never be used in llama.cpp ecosystem so no worries; that's the key point and the benefit
it won't degrade the quality since it works exactly like safetensors
You're storing the token embedding in Q4_0. Llama.cpp has rules in place for specific tensors, that being one of them, similarly how we need to keep some tensors in higher precision for image/video models as well.
Storing the token embedding in super low precision like that will affect quality.
llama.cpp hf-to-gguf conversion process will rename all the tensor headers; which makes the reverse convert very difficult
They're renamed for a reason, the gguf ecosystem is primarily used for llama.cpp which uses a standard naming scheme when constructing the compute graph.
For reference, here is the code for llama.cpp -> hf mapping, the hf->llama.cpp mapping can be found in convert_hf_to_gguf.py in the llama.cpp repo or by doing the logic above in reverse.
we understand that all the tensors will be read by torch in comfyui eventually; such separate encoder is no meaning and never be used in llama.cpp ecosystem so no worries
As far as I see it, every gguf file that stores a text model is part of the llama.cpp ecosystem. The only time it is used as a storage medium for a non-text model is for image and video models. For text models that are unsupported by llama.cpp I could see it making sense, but T5 that has been supported in llama.cpp even before the GGUF file format was used for image models.
Even if your leave all checks off and let users convert those models locally, uploading them to a repository like this just creates confusion and vendor lock-in in the long run, as mentioned above.
The main ComfyUI-GGUF node pack always verifies the model architecture before loading as all models are tested for compatibility before release. Bypassing that check is not an option, so this would mean we would have to add extra code paths each time you upload a custom format like this.
both of you have valid points; i think it's better to keep both compatibility and possibility; personally i have no further comment🤣
btw
@city96
could you try this t5xxl provided by
@chatpig
if it works, then you don't need to add extra code each time; guess just include "pig" in TXT_ARCH_LIST should be fine ❤️️
It works the same way the Q4_0 version works, if I disable the architecture check.
Adding "pig" to the arch list would mean no checking on what the model is, since your header doesn't include any information about the model architecture. As this is not something we can do, the next best thing would be adding the detection logic to detect and allow your custom T5 format, but yeah, that goes back to the same issue lol.
Currently, for image models with "pig" as the architecture, we invoke the model detection code used in convert.py for detecting what the model type is. As all text encoders are either safetensors (as clip is small enough even at full precision) or already supported by llama.cpp (T5, llama, gemma), there's no code in place for detection that we can fall back to.
pig
is essentially safetensors; if "pig" in TXT_ARCH_LIST the gguf version clip-g and clig-l will work as well
q4_1's architecture (in metadata) is t5; it will pass your architecture check and jump to your t5 or t5encoder logic; but it works even with q4_0's tensor structure which means the final destination is torch not llama.cpp; the tag might not reflect the actual precision sometimes, should judge it by file size and the actual test results
It works the same way the Q4_0 version works, if I disable the architecture check.
don't need to disable architecture check; the architecture set to "t5" already
don't need to disable architecture check; the architecture set to "t5" already
Ah right, I missed that lol. Yeah guess that'd bypass the code checking, though the model would still be non-standard for all intents and purposes.
pig is essentially safetensors;
That doesn't really change the fact that most software expect it to be in the established key format, though. i.e. it doesn't work with transformers, which would break stuff like SDNext.
Closing for now, but I do still think you should consider using the intended key format to not break downstream usecases.