Ian Andolina's picture

3 23

Ian Andolina

iandol

·

iandol

AI & ML interests

None yet

Recent Activity

upvoted a paper 13 days ago

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

reacted to bartowski's post with 👍 12 months ago

So turns out I've been spreading a bit of misinformation when it comes to imatrix in llama.cpp It starts true; imatrix runs the model against a corpus of text and tracks the activation of weights to determine which are most important However what the quantization then does with that information is where I was wrong. I think I made the accidental connection between imatrix and exllamav2's measuring, where ExLlamaV2 decides how many bits to assign to which weight depending on the goal BPW Instead, what llama.cpp with imatrix does is it attempts to select a scale for a quantization block that most accurately returns the important weights to their original values, ie minimizing the dequantization error based on the importance of activations The mildly surprising part is that it actually just does a relatively brute force search, it picks a bunch of scales and tries each and sees which one results in the minimum error for weights deemed important in the group But yeah, turns out, the quantization scheme is always the same, it's just that the scaling has a bit more logic to it when you use imatrix Huge shoutout to @compilade for helping me wrap my head around it - feel free to add/correct as well if I've messed something up

liked a model about 1 year ago

lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF

View all activity

Organizations

models 0

None public yet

datasets 0

None public yet