91 17 20

Michael Goin

mgoin

mgoin_
mgoin

AI & ML interests

LLM inference optimization, compression, quantization, pruning, distillation

Recent Activity

new activity about 2 hours ago

nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic:OSError: nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic does not appear to have a file named decilm.py

updated a model about 2 hours ago

nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic

new activity 3 days ago

nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic:how to deploy this model without internet connection

View all activity

Organizations

mgoin's activity

New activity in nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic about 2 hours ago

OSError: nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic does not appear to have a file named decilm.py

#2 opened 2 days ago by

TheDrummer

New activity in nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic 3 days ago

how to deploy this model without internet connection

#1 opened 5 days ago by

superahn

New activity in RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic 10 days ago

Why not FP8 with static and per-tensor quantization?

#2 opened 11 days ago by

wanzhenchn

New activity in mistralai/Mistral-Small-3.1-24B-Instruct-2503 14 days ago

Address discrepancies in the languages supported by the Mistral Small 3.1 2503

#54 opened 16 days ago by

fpaupier

New activity in RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic 14 days ago

Please update the chat template

#1 opened 16 days ago by

stelterlab

New activity in mistralai/Mistral-Small-3.1-24B-Instruct-2503 14 days ago

FP8 Dynamic/W8A16 Quants Please

#44 opened 25 days ago by

rjmehta

New activity in mistralai/Mistral-Small-3.1-24B-Instruct-2503 18 days ago

Problem hosting the model using vllm

#45 opened 25 days ago by

ShaoServient

New activity in RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8 about 2 months ago

Remove image_processor_type

#1 opened about 2 months ago by

pooya-davoodi-parasail

New activity in RedHatAI/Qwen2.5-VL-7B-Instruct-quantized.w8a8 about 2 months ago

Remove image_processor_type

#1 opened about 2 months ago by

pooya-davoodi-parasail

New activity in RedHatAI/Qwen2.5-VL-72B-Instruct-FP8-Dynamic about 2 months ago

Remove image_processor_type

#2 opened about 2 months ago by

pooya-davoodi-parasail

New activity in RedHatAI/Qwen2.5-VL-7B-Instruct-FP8-Dynamic about 2 months ago

Use Qwen2VLImageProcessor for image_processor_type

#2 opened about 2 months ago by

pooya-davoodi-parasail

Use Qwen2VLImageProcessor for image_processor_type

#3 opened about 2 months ago by

pooya-davoodi-parasail

New activity in cognitivecomputations/DeepSeek-R1-AWQ 2 months ago

when i use vllm v0.7.2 to deploy r1 awq, i got empty content

#10 opened 2 months ago by

bupalinyu

MLA is not supported with moe_wna16 quantization. Disabling MLA.

#7 opened 2 months ago by

AMOSE

New activity in RedHatAI/gemma-2-9b-it-FP8 2 months ago

AttributeError: 'Gemma2Config' object has no attribute 'interleaved_sliding_window' Traceback (most recent call last):

#3 opened 3 months ago by

samos123

New activity in RedHatAI/granite-3.1-8b-instruct-FP8-dynamic 3 months ago

compressed-tensors MLA support requires fp8 activations and weights in group 'group_0',

#1 opened 3 months ago by

samos123

New activity in RedHatAI/Meta-Llama-3-8B-Instruct-FP8-KV 3 months ago

How to load this model?

#1 opened 10 months ago by

Frz614

New activity in RedHatAI/Llama-3.2-90B-Vision-Instruct-FP8-dynamic 4 months ago

Model does not run with VLLM

#3 opened 4 months ago by

aswad546

New activity in nm-testing/Llama-3.3-70B-Instruct-FP8-dynamic 4 months ago

Nice model, any info on scripts used to quantize?

#1 opened 4 months ago by

RonanMcGovern

New activity in RedHatAI/Llama-3.2-11B-Vision-Instruct-FP8-dynamic 4 months ago

Thanks!

#2 opened 4 months ago by

Jindows