Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
3
23
Ian Andolina
iandol
Follow
Kukedlc's profile picture
netcat420's profile picture
21world's profile picture
3 followers
ยท
23 following
iandol
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
13 days ago
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
reacted
to
bartowski
's
post
with ๐
12 months ago
So turns out I've been spreading a bit of misinformation when it comes to imatrix in llama.cpp It starts true; imatrix runs the model against a corpus of text and tracks the activation of weights to determine which are most important However what the quantization then does with that information is where I was wrong. I think I made the accidental connection between imatrix and exllamav2's measuring, where ExLlamaV2 decides how many bits to assign to which weight depending on the goal BPW Instead, what llama.cpp with imatrix does is it attempts to select a scale for a quantization block that most accurately returns the important weights to their original values, ie minimizing the dequantization error based on the importance of activations The mildly surprising part is that it actually just does a relatively brute force search, it picks a bunch of scales and tries each and sees which one results in the minimum error for weights deemed important in the group But yeah, turns out, the quantization scheme is always the same, it's just that the scaling has a bit more logic to it when you use imatrix Huge shoutout to @compilade for helping me wrap my head around it - feel free to add/correct as well if I've messed something up
liked
a model
about 1 year ago
lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF
View all activity
Organizations
models
0
None public yet
datasets
0
None public yet