Do you want the lowest latency, the highest throughput, support for many models, or just highly optimize 1 specific model?