Post
2532
ZML just released a technical preview of their new Inference Engine: LLMD.
- Just 2.4GB container, which means fast startup times and efficient autoscaling
- Cross-Platform GPU Support: works on both NVIDIA and AMD GPUs.
- written in Zig
I just tried it out and deployed it on Hugging Face Inference Endpoints and wrote a quick guide ๐ You can try it in like 5 minutes!
https://huggingface.co/blog/erikkaum/test-driving-llmd-inference-engine
- Just 2.4GB container, which means fast startup times and efficient autoscaling
- Cross-Platform GPU Support: works on both NVIDIA and AMD GPUs.
- written in Zig
I just tried it out and deployed it on Hugging Face Inference Endpoints and wrote a quick guide ๐ You can try it in like 5 minutes!
https://huggingface.co/blog/erikkaum/test-driving-llmd-inference-engine