helloxm's picture
Update README.md
5726fe6 verified
|
raw
history blame
1.74 kB
metadata
license: mit
tags:
  - LiteRT
base_model:
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

litert-community/DeepSeek-R1-Distill-Qwen-1.5B

This model was converted to LiteRT (aka TFLite) format from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B using Google AI Edge Torch.

Run the model in colab

Open In Colab

Run the model on Android

Please follow the instructions.

Benchmarking results

Note that all benchmark stats are from a Samsung S24 Ultra.

Model Params GGML tk/s (CPU, 4 threads) GGML tk/s (CPU, 8 threads) LiteRT tk/s (XNNPACK, 4 threads) LiteRT tk/s (XNNPACK, 8 threads)
DeepSeek-R1-Distill-Qwen-1.5B (Int8 quantized) 1.78 B Prefill 512 tokens 64.66 87.18 260.95 299.15
Decode 128 tokens 23.85 15.37 23.126 10.486