T5 GGUF Incompatible with llama.cpp

#2
by city96 - opened

The gguf T5 text encoder in this repository seems to use a completely non-standard state dict format. My general recommendation is not to create invalid files like this for models that are already supported by llama.cpp, as it causes problems with the rest of the ecosystem.

HiDream uses the same T5 XXL encoder as Flux, which is fully supported in llama.cpp and has plenty of available quantized weights.

Also, the T5 model from this repository uses Q4_0 for all weights, including important ones which should be kept in q8_0 or F16, which most likely degrades quality.

thanks @city96 ; compatible with the llama.cpp hf-to-gguf t5encoder as well; such as your version and this version; pig architecture is @chatpig 's work; as long as it works, i don't have any comment🐷👀💧

Is there any reason not to just upload one of the models from chatpig/t5-v1_1-xxl-encoder-fp32-gguf or link to it?

The new format cannot be used with ComfyUI-GGUF (only accepts llama.cpp compatible text encoders) so it forces people to use your node pack and creates confusion.

The file "llama-q2_k.gguf" is a completely standard file on the other hand, converted with llama.cpp since it has all the metadata. It works in ComfyUI-GGUF.

sure, upload it right away; thanks

Thank you! I'll see if I can add a special case for the CLIP models from here so people won't have to re-download the safetensor variants. They seem to be FP16 without using any quantized parts so I think I can treat them the same way as safetensors for now.

The gguf T5 text encoder in this repository seems to use a completely non-standard state dict format. My general recommendation is not to create invalid files like this for models that are already supported by llama.cpp, as it causes problems with the rest of the ecosystem.

HiDream uses the same T5 XXL encoder as Flux, which is fully supported in llama.cpp and has plenty of available quantized weights.

Also, the T5 model from this repository uses Q4_0 for all weights, including important ones which should be kept in q8_0 or F16, which most likely degrades quality.

Oh, we discuss that issue; it won't degrade the quality since it works exactly like safetensors; llama.cpp hf-to-gguf conversion process will rename all the tensor headers; which makes the reverse convert very difficult; good, you notice that: the new version can be reverse converted back to safetensors with convertor (reverse) and it still works; but the llama.cpp version doesn't; we understand that all the tensors will be read by torch in comfyui eventually; such separate encoder is no meaning and never be used in llama.cpp ecosystem so no worries; that's the key point and the benefit

it won't degrade the quality since it works exactly like safetensors

You're storing the token embedding in Q4_0. Llama.cpp has rules in place for specific tensors, that being one of them, similarly how we need to keep some tensors in higher precision for image/video models as well.
Storing the token embedding in super low precision like that will affect quality.

image.png

llama.cpp hf-to-gguf conversion process will rename all the tensor headers; which makes the reverse convert very difficult

They're renamed for a reason, the gguf ecosystem is primarily used for llama.cpp which uses a standard naming scheme when constructing the compute graph.
For reference, here is the code for llama.cpp -> hf mapping, the hf->llama.cpp mapping can be found in convert_hf_to_gguf.py in the llama.cpp repo or by doing the logic above in reverse.

we understand that all the tensors will be read by torch in comfyui eventually; such separate encoder is no meaning and never be used in llama.cpp ecosystem so no worries

As far as I see it, every gguf file that stores a text model is part of the llama.cpp ecosystem. The only time it is used as a storage medium for a non-text model is for image and video models. For text models that are unsupported by llama.cpp I could see it making sense, but T5 that has been supported in llama.cpp even before the GGUF file format was used for image models.

Even if your leave all checks off and let users convert those models locally, uploading them to a repository like this just creates confusion and vendor lock-in in the long run, as mentioned above.
The main ComfyUI-GGUF node pack always verifies the model architecture before loading as all models are tested for compatibility before release. Bypassing that check is not an option, so this would mean we would have to add extra code paths each time you upload a custom format like this.

thanks @city96 ; pass it to @chatpig 🐷👀

both of you have valid points; i think it's better to keep both compatibility and possibility; personally i have no further comment🤣

btw @city96 could you try this t5xxl provided by @chatpig
if it works, then you don't need to add extra code each time; guess just include "pig" in TXT_ARCH_LIST should be fine ❤️️

It works the same way the Q4_0 version works, if I disable the architecture check.

Adding "pig" to the arch list would mean no checking on what the model is, since your header doesn't include any information about the model architecture. As this is not something we can do, the next best thing would be adding the detection logic to detect and allow your custom T5 format, but yeah, that goes back to the same issue lol.

Currently, for image models with "pig" as the architecture, we invoke the model detection code used in convert.py for detecting what the model type is. As all text encoders are either safetensors (as clip is small enough even at full precision) or already supported by llama.cpp (T5, llama, gemma), there's no code in place for detection that we can fall back to.

pig is essentially safetensors; if "pig" in TXT_ARCH_LIST the gguf version clip-g and clig-l will work as well

q4_1's architecture (in metadata) is t5; it will pass your architecture check and jump to your t5 or t5encoder logic; but it works even with q4_0's tensor structure which means the final destination is torch not llama.cpp; the tag might not reflect the actual precision sometimes, should judge it by file size and the actual test results

It works the same way the Q4_0 version works, if I disable the architecture check.

don't need to disable architecture check; the architecture set to "t5" already

don't need to disable architecture check; the architecture set to "t5" already

Ah right, I missed that lol. Yeah guess that'd bypass the code checking, though the model would still be non-standard for all intents and purposes.

pig is essentially safetensors;

That doesn't really change the fact that most software expect it to be in the established key format, though. i.e. it doesn't work with transformers, which would break stuff like SDNext.

image.png

city96 changed discussion status to closed

Closing for now, but I do still think you should consider using the intended key format to not break downstream usecases.

Screenshot 20250418a.png

it works; not q4_0, should take q4_1; sure, chat soon

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment