Understand B200 chip is able to serve this model via TensorRT-LLM.Have you tried serving the nvfp4 model in RTX 5090 chip using vllm?
· Sign up or log in to comment