schmidt-sebastian commited on
Commit
b3a4c2c
·
verified ·
1 Parent(s): 0f175a2

Add files using upload-large-folder tool

Browse files
DeepSeek-R1-Distill-Qwen-1.5B_multi-prefill-seq_f32_ekv1280.tflite ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4eb5bae61d9717fd9d0ff9c1f00266e27ab049ec807b1e9dd1440041600bcdfc
3
+ size 7121909824
DeepSeek-R1-Distill-Qwen-1.5B_multi-prefill-seq_q8_ekv1280.tflite ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad356efd69259876152347e51cf776443a4fce3abe61d7212bb685c702d70560
3
+ size 1858445888
DeepSeek-R1-Distill-Qwen-1.5B_seq128_f32_ekv1280.tflite ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba4dd48eec47d612b9501f8365671e4a9714487a67c4276a00309637e99f7a02
3
+ size 7116878944
DeepSeek-R1-Distill-Qwen-1.5B_seq128_q8_ekv1280.tflite ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b06368fa46eb6934daf035e5bcc7bdd37086f0fd92fbbe9ce744613a379e209
3
+ size 1806773448
README.md CHANGED
@@ -1,34 +1,47 @@
1
  ---
2
  license: mit
3
- base_model:
4
- - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 
 
5
  ---
6
 
7
  # litert-community/DeepSeek-R1-Distill-Qwen-1.5B
8
 
9
- This model provides a few variants of [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) that are ready for deployment on Android using the [LiteRT (fka TFLite) stack](https://ai.google.dev/edge/litert) and [MediaPipe LLM Inference API](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference).
 
 
 
 
10
 
11
  ## Use the models
12
 
13
  ### Colab
14
 
15
- *Disclaimer: The target deployment surface for the LiteRT models is Android/iOS/Web and the stack has been optimized for performance on these targets. Trying out the system in Colab is an easier way to familiarize yourself with the LiteRT stack, with the caveat that the performance (memory and latency) on Colab could be much worse than on a local device.*
 
 
 
 
16
 
17
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https://huggingface.co/litert-community/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/deepseek_tflite.ipynb)
18
 
19
  ### Android
20
 
21
- * Download and install [the apk](https://github.com/google-ai-edge/mediapipe-samples/releases/download/v0.1.0/llm_inference_v0.1.0-debug.apk).
22
- * Follow the instructions in the app.
 
23
 
24
-
25
- To build the demo app from source, please follow the [instructions](https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/android/README.md) from the GitHub repository.
 
26
 
27
  ## Performance
28
 
29
  ### Android
30
 
31
- Note that all benchmark stats are from a Samsung S24 Ultra with 1280 KV cache size, 512 tokens prefill, 128 tokens decode.
 
32
 
33
  <table border="1">
34
  <tr>
@@ -41,26 +54,30 @@ Note that all benchmark stats are from a Samsung S24 Ultra with 1280 KV cache si
41
  <th>Model size (MB)</th>
42
  </tr>
43
  <tr>
44
- <td>fp32 (baseline)</td>
45
- <td rowspan="2">CPU</td>
46
- <td><p style="text-align: right">45</p></td>
47
- <td><p style="text-align: right">6</p></td>
48
- <td><p style="text-align: right">8</p></td>
49
- <td><p style="text-align: right">6,213</p></td>
50
- <td><p style="text-align: right">7,124</p></td>
51
- </tr>
52
- <tr>
53
- <td>dynamic_int8</td>
54
- <td><p style="text-align: right">261</p></td>
55
- <td><p style="text-align: right">23</p></td>
56
- <td><p style="text-align: right">2 </p></td>
57
- <td><p style="text-align: right">1,936 </p></td>
58
- <td><p style="text-align: right">1,861</p></td>
59
- </tr>
 
 
60
  </table>
61
 
62
- * Model Size: measured by the size of the .tflite flatbuffer (serialization format for LiteRT models)
 
63
  * Memory: indicator of peak RAM usage
64
- * The inference on CPU is accelerated via the LiteRT [XNNPACK](https://github.com/google/XNNPACK) delegate with 4 threads
 
65
  * Benchmark is done assuming XNNPACK cache is enabled
66
- * dynamic_int8: quantized model with int8 weights and float activations.
 
1
  ---
2
  license: mit
3
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - chat
7
  ---
8
 
9
  # litert-community/DeepSeek-R1-Distill-Qwen-1.5B
10
 
11
+ This model provides a few variants of
12
+ [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) that are ready for
13
+ deployment on Android using the
14
+ [LiteRT (fka TFLite) stack](https://ai.google.dev/edge/litert) and
15
+ [MediaPipe LLM Inference API](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference).
16
 
17
  ## Use the models
18
 
19
  ### Colab
20
 
21
+ *Disclaimer: The target deployment surface for the LiteRT models is
22
+ Android/iOS/Web and the stack has been optimized for performance on these
23
+ targets. Trying out the system in Colab is an easier way to familiarize yourself
24
+ with the LiteRT stack, with the caveat that the performance (memory and latency)
25
+ on Colab could be much worse than on a local device.*
26
 
27
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https://huggingface.co/litert-community/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/notebook.ipynb)
28
 
29
  ### Android
30
 
31
+ * Download and install
32
+ [the apk](https://github.com/google-ai-edge/mediapipe-samples/releases/latest/download/llm_inference-debug.apk).
33
+ * Follow the instructions in the app.
34
 
35
+ To build the demo app from source, please follow the
36
+ [instructions](https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/android/README.md)
37
+ from the GitHub repository.
38
 
39
  ## Performance
40
 
41
  ### Android
42
 
43
+ Note that all benchmark stats are from a Samsung S24 Ultra with
44
+ 1280 KV cache size with multiple prefill signatures enabled.
45
 
46
  <table border="1">
47
  <tr>
 
54
  <th>Model size (MB)</th>
55
  </tr>
56
  <tr>
57
+ <td>fp32 (baseline)</td>
58
+ <td>cpu</td>
59
+ <td><p style="text-align: right">39.56 tk/s</p></td>
60
+ <td><p style="text-align: right">1.43 tk/s</p></td>
61
+ <td><p style="text-align: right">19.24 s</p></td>
62
+ <td><p style="text-align: right">5,997 MB</p></td>
63
+ <td><p style="text-align: right">6,794 MB</p></td>
64
+ </tr>
65
+ <tr>
66
+ <td>dynamic_int8</td>
67
+ <td>cpu</td>
68
+ <td><p style="text-align: right">110.58 tk/s</p></td>
69
+ <td><p style="text-align: right">12.96 tk/s</p></td>
70
+ <td><p style="text-align: right">6.81 s</p></td>
71
+ <td><p style="text-align: right">3,598 MB</p></td>
72
+ <td><p style="text-align: right">1,774 MB</p></td>
73
+ </tr>
74
+
75
  </table>
76
 
77
+ * Model Size: measured by the size of the .tflite flatbuffer (serialization
78
+ format for LiteRT models)
79
  * Memory: indicator of peak RAM usage
80
+ * The inference on CPU is accelerated via the LiteRT
81
+ [XNNPACK](https://github.com/google/XNNPACK) delegate with 4 threads
82
  * Benchmark is done assuming XNNPACK cache is enabled
83
+ * dynamic_int8: quantized model with int8 weights and float activations.
commit_hash ADDED
File without changes
notebook.ipynb ADDED
@@ -0,0 +1,1463 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "nbformat": 4,
3
+ "nbformat_minor": 0,
4
+ "metadata": {
5
+ "colab": {
6
+ "provenance": []
7
+ },
8
+ "kernelspec": {
9
+ "name": "python3",
10
+ "display_name": "Python 3"
11
+ },
12
+ "language_info": {
13
+ "name": "python"
14
+ },
15
+ "widgets": {
16
+ "application/vnd.jupyter.widget-state+json": {
17
+ "47cd47140dbb4e28a4f31d5632bfe82d": {
18
+ "model_module": "@jupyter-widgets/controls",
19
+ "model_name": "HBoxModel",
20
+ "model_module_version": "1.5.0",
21
+ "state": {
22
+ "_dom_classes": [],
23
+ "_model_module": "@jupyter-widgets/controls",
24
+ "_model_module_version": "1.5.0",
25
+ "_model_name": "HBoxModel",
26
+ "_view_count": null,
27
+ "_view_module": "@jupyter-widgets/controls",
28
+ "_view_module_version": "1.5.0",
29
+ "_view_name": "HBoxView",
30
+ "box_style": "",
31
+ "children": [
32
+ "IPY_MODEL_7c0ddb1e0e3145f08ccb0c32b02c562f",
33
+ "IPY_MODEL_85c490db972b4d659caad513359a6700",
34
+ "IPY_MODEL_d61e96ae08d84414a638dd592f13fb18"
35
+ ],
36
+ "layout": "IPY_MODEL_9e7f4734aa034e4aa5207b8a2498ee02"
37
+ }
38
+ },
39
+ "7c0ddb1e0e3145f08ccb0c32b02c562f": {
40
+ "model_module": "@jupyter-widgets/controls",
41
+ "model_name": "HTMLModel",
42
+ "model_module_version": "1.5.0",
43
+ "state": {
44
+ "_dom_classes": [],
45
+ "_model_module": "@jupyter-widgets/controls",
46
+ "_model_module_version": "1.5.0",
47
+ "_model_name": "HTMLModel",
48
+ "_view_count": null,
49
+ "_view_module": "@jupyter-widgets/controls",
50
+ "_view_module_version": "1.5.0",
51
+ "_view_name": "HTMLView",
52
+ "description": "",
53
+ "description_tooltip": null,
54
+ "layout": "IPY_MODEL_df08ba8056fb47cb969e132087987e68",
55
+ "placeholder": "​",
56
+ "style": "IPY_MODEL_470febc3af8348ef8611255e88401229",
57
+ "value": "deepseek_q8_seq128_ekv1280.tflite: 100%"
58
+ }
59
+ },
60
+ "85c490db972b4d659caad513359a6700": {
61
+ "model_module": "@jupyter-widgets/controls",
62
+ "model_name": "FloatProgressModel",
63
+ "model_module_version": "1.5.0",
64
+ "state": {
65
+ "_dom_classes": [],
66
+ "_model_module": "@jupyter-widgets/controls",
67
+ "_model_module_version": "1.5.0",
68
+ "_model_name": "FloatProgressModel",
69
+ "_view_count": null,
70
+ "_view_module": "@jupyter-widgets/controls",
71
+ "_view_module_version": "1.5.0",
72
+ "_view_name": "ProgressView",
73
+ "bar_style": "success",
74
+ "description": "",
75
+ "description_tooltip": null,
76
+ "layout": "IPY_MODEL_39cedca11f574c01808acdc1be9aa68d",
77
+ "max": 1808783640,
78
+ "min": 0,
79
+ "orientation": "horizontal",
80
+ "style": "IPY_MODEL_62bd6d393ca74193bded59a8ebd0a749",
81
+ "value": 1808783640
82
+ }
83
+ },
84
+ "d61e96ae08d84414a638dd592f13fb18": {
85
+ "model_module": "@jupyter-widgets/controls",
86
+ "model_name": "HTMLModel",
87
+ "model_module_version": "1.5.0",
88
+ "state": {
89
+ "_dom_classes": [],
90
+ "_model_module": "@jupyter-widgets/controls",
91
+ "_model_module_version": "1.5.0",
92
+ "_model_name": "HTMLModel",
93
+ "_view_count": null,
94
+ "_view_module": "@jupyter-widgets/controls",
95
+ "_view_module_version": "1.5.0",
96
+ "_view_name": "HTMLView",
97
+ "description": "",
98
+ "description_tooltip": null,
99
+ "layout": "IPY_MODEL_475c5c4fc6eb404180d7b69d75f797ea",
100
+ "placeholder": "​",
101
+ "style": "IPY_MODEL_b815fc17c9ee4913b5cb452653ff1af9",
102
+ "value": " 1.81G/1.81G [00:16\u0026lt;00:00, 160MB/s]"
103
+ }
104
+ },
105
+ "9e7f4734aa034e4aa5207b8a2498ee02": {
106
+ "model_module": "@jupyter-widgets/base",
107
+ "model_name": "LayoutModel",
108
+ "model_module_version": "1.2.0",
109
+ "state": {
110
+ "_model_module": "@jupyter-widgets/base",
111
+ "_model_module_version": "1.2.0",
112
+ "_model_name": "LayoutModel",
113
+ "_view_count": null,
114
+ "_view_module": "@jupyter-widgets/base",
115
+ "_view_module_version": "1.2.0",
116
+ "_view_name": "LayoutView",
117
+ "align_content": null,
118
+ "align_items": null,
119
+ "align_self": null,
120
+ "border": null,
121
+ "bottom": null,
122
+ "display": null,
123
+ "flex": null,
124
+ "flex_flow": null,
125
+ "grid_area": null,
126
+ "grid_auto_columns": null,
127
+ "grid_auto_flow": null,
128
+ "grid_auto_rows": null,
129
+ "grid_column": null,
130
+ "grid_gap": null,
131
+ "grid_row": null,
132
+ "grid_template_areas": null,
133
+ "grid_template_columns": null,
134
+ "grid_template_rows": null,
135
+ "height": null,
136
+ "justify_content": null,
137
+ "justify_items": null,
138
+ "left": null,
139
+ "margin": null,
140
+ "max_height": null,
141
+ "max_width": null,
142
+ "min_height": null,
143
+ "min_width": null,
144
+ "object_fit": null,
145
+ "object_position": null,
146
+ "order": null,
147
+ "overflow": null,
148
+ "overflow_x": null,
149
+ "overflow_y": null,
150
+ "padding": null,
151
+ "right": null,
152
+ "top": null,
153
+ "visibility": null,
154
+ "width": null
155
+ }
156
+ },
157
+ "df08ba8056fb47cb969e132087987e68": {
158
+ "model_module": "@jupyter-widgets/base",
159
+ "model_name": "LayoutModel",
160
+ "model_module_version": "1.2.0",
161
+ "state": {
162
+ "_model_module": "@jupyter-widgets/base",
163
+ "_model_module_version": "1.2.0",
164
+ "_model_name": "LayoutModel",
165
+ "_view_count": null,
166
+ "_view_module": "@jupyter-widgets/base",
167
+ "_view_module_version": "1.2.0",
168
+ "_view_name": "LayoutView",
169
+ "align_content": null,
170
+ "align_items": null,
171
+ "align_self": null,
172
+ "border": null,
173
+ "bottom": null,
174
+ "display": null,
175
+ "flex": null,
176
+ "flex_flow": null,
177
+ "grid_area": null,
178
+ "grid_auto_columns": null,
179
+ "grid_auto_flow": null,
180
+ "grid_auto_rows": null,
181
+ "grid_column": null,
182
+ "grid_gap": null,
183
+ "grid_row": null,
184
+ "grid_template_areas": null,
185
+ "grid_template_columns": null,
186
+ "grid_template_rows": null,
187
+ "height": null,
188
+ "justify_content": null,
189
+ "justify_items": null,
190
+ "left": null,
191
+ "margin": null,
192
+ "max_height": null,
193
+ "max_width": null,
194
+ "min_height": null,
195
+ "min_width": null,
196
+ "object_fit": null,
197
+ "object_position": null,
198
+ "order": null,
199
+ "overflow": null,
200
+ "overflow_x": null,
201
+ "overflow_y": null,
202
+ "padding": null,
203
+ "right": null,
204
+ "top": null,
205
+ "visibility": null,
206
+ "width": null
207
+ }
208
+ },
209
+ "470febc3af8348ef8611255e88401229": {
210
+ "model_module": "@jupyter-widgets/controls",
211
+ "model_name": "DescriptionStyleModel",
212
+ "model_module_version": "1.5.0",
213
+ "state": {
214
+ "_model_module": "@jupyter-widgets/controls",
215
+ "_model_module_version": "1.5.0",
216
+ "_model_name": "DescriptionStyleModel",
217
+ "_view_count": null,
218
+ "_view_module": "@jupyter-widgets/base",
219
+ "_view_module_version": "1.2.0",
220
+ "_view_name": "StyleView",
221
+ "description_width": ""
222
+ }
223
+ },
224
+ "39cedca11f574c01808acdc1be9aa68d": {
225
+ "model_module": "@jupyter-widgets/base",
226
+ "model_name": "LayoutModel",
227
+ "model_module_version": "1.2.0",
228
+ "state": {
229
+ "_model_module": "@jupyter-widgets/base",
230
+ "_model_module_version": "1.2.0",
231
+ "_model_name": "LayoutModel",
232
+ "_view_count": null,
233
+ "_view_module": "@jupyter-widgets/base",
234
+ "_view_module_version": "1.2.0",
235
+ "_view_name": "LayoutView",
236
+ "align_content": null,
237
+ "align_items": null,
238
+ "align_self": null,
239
+ "border": null,
240
+ "bottom": null,
241
+ "display": null,
242
+ "flex": null,
243
+ "flex_flow": null,
244
+ "grid_area": null,
245
+ "grid_auto_columns": null,
246
+ "grid_auto_flow": null,
247
+ "grid_auto_rows": null,
248
+ "grid_column": null,
249
+ "grid_gap": null,
250
+ "grid_row": null,
251
+ "grid_template_areas": null,
252
+ "grid_template_columns": null,
253
+ "grid_template_rows": null,
254
+ "height": null,
255
+ "justify_content": null,
256
+ "justify_items": null,
257
+ "left": null,
258
+ "margin": null,
259
+ "max_height": null,
260
+ "max_width": null,
261
+ "min_height": null,
262
+ "min_width": null,
263
+ "object_fit": null,
264
+ "object_position": null,
265
+ "order": null,
266
+ "overflow": null,
267
+ "overflow_x": null,
268
+ "overflow_y": null,
269
+ "padding": null,
270
+ "right": null,
271
+ "top": null,
272
+ "visibility": null,
273
+ "width": null
274
+ }
275
+ },
276
+ "62bd6d393ca74193bded59a8ebd0a749": {
277
+ "model_module": "@jupyter-widgets/controls",
278
+ "model_name": "ProgressStyleModel",
279
+ "model_module_version": "1.5.0",
280
+ "state": {
281
+ "_model_module": "@jupyter-widgets/controls",
282
+ "_model_module_version": "1.5.0",
283
+ "_model_name": "ProgressStyleModel",
284
+ "_view_count": null,
285
+ "_view_module": "@jupyter-widgets/base",
286
+ "_view_module_version": "1.2.0",
287
+ "_view_name": "StyleView",
288
+ "bar_color": null,
289
+ "description_width": ""
290
+ }
291
+ },
292
+ "475c5c4fc6eb404180d7b69d75f797ea": {
293
+ "model_module": "@jupyter-widgets/base",
294
+ "model_name": "LayoutModel",
295
+ "model_module_version": "1.2.0",
296
+ "state": {
297
+ "_model_module": "@jupyter-widgets/base",
298
+ "_model_module_version": "1.2.0",
299
+ "_model_name": "LayoutModel",
300
+ "_view_count": null,
301
+ "_view_module": "@jupyter-widgets/base",
302
+ "_view_module_version": "1.2.0",
303
+ "_view_name": "LayoutView",
304
+ "align_content": null,
305
+ "align_items": null,
306
+ "align_self": null,
307
+ "border": null,
308
+ "bottom": null,
309
+ "display": null,
310
+ "flex": null,
311
+ "flex_flow": null,
312
+ "grid_area": null,
313
+ "grid_auto_columns": null,
314
+ "grid_auto_flow": null,
315
+ "grid_auto_rows": null,
316
+ "grid_column": null,
317
+ "grid_gap": null,
318
+ "grid_row": null,
319
+ "grid_template_areas": null,
320
+ "grid_template_columns": null,
321
+ "grid_template_rows": null,
322
+ "height": null,
323
+ "justify_content": null,
324
+ "justify_items": null,
325
+ "left": null,
326
+ "margin": null,
327
+ "max_height": null,
328
+ "max_width": null,
329
+ "min_height": null,
330
+ "min_width": null,
331
+ "object_fit": null,
332
+ "object_position": null,
333
+ "order": null,
334
+ "overflow": null,
335
+ "overflow_x": null,
336
+ "overflow_y": null,
337
+ "padding": null,
338
+ "right": null,
339
+ "top": null,
340
+ "visibility": null,
341
+ "width": null
342
+ }
343
+ },
344
+ "b815fc17c9ee4913b5cb452653ff1af9": {
345
+ "model_module": "@jupyter-widgets/controls",
346
+ "model_name": "DescriptionStyleModel",
347
+ "model_module_version": "1.5.0",
348
+ "state": {
349
+ "_model_module": "@jupyter-widgets/controls",
350
+ "_model_module_version": "1.5.0",
351
+ "_model_name": "DescriptionStyleModel",
352
+ "_view_count": null,
353
+ "_view_module": "@jupyter-widgets/base",
354
+ "_view_module_version": "1.2.0",
355
+ "_view_name": "StyleView",
356
+ "description_width": ""
357
+ }
358
+ },
359
+ "8cac4d03da1044d6adb8b62752ed6775": {
360
+ "model_module": "@jupyter-widgets/controls",
361
+ "model_name": "HBoxModel",
362
+ "model_module_version": "1.5.0",
363
+ "state": {
364
+ "_dom_classes": [],
365
+ "_model_module": "@jupyter-widgets/controls",
366
+ "_model_module_version": "1.5.0",
367
+ "_model_name": "HBoxModel",
368
+ "_view_count": null,
369
+ "_view_module": "@jupyter-widgets/controls",
370
+ "_view_module_version": "1.5.0",
371
+ "_view_name": "HBoxView",
372
+ "box_style": "",
373
+ "children": [
374
+ "IPY_MODEL_a201091e2f9b4f6c8a7d780dde854134",
375
+ "IPY_MODEL_16e2c22fb42e41e8b810c4e659091d37",
376
+ "IPY_MODEL_a1f5e814104646cbac5db19fdbcfccb2"
377
+ ],
378
+ "layout": "IPY_MODEL_3186fb1553884a7da72a387f1e00eca5"
379
+ }
380
+ },
381
+ "a201091e2f9b4f6c8a7d780dde854134": {
382
+ "model_module": "@jupyter-widgets/controls",
383
+ "model_name": "HTMLModel",
384
+ "model_module_version": "1.5.0",
385
+ "state": {
386
+ "_dom_classes": [],
387
+ "_model_module": "@jupyter-widgets/controls",
388
+ "_model_module_version": "1.5.0",
389
+ "_model_name": "HTMLModel",
390
+ "_view_count": null,
391
+ "_view_module": "@jupyter-widgets/controls",
392
+ "_view_module_version": "1.5.0",
393
+ "_view_name": "HTMLView",
394
+ "description": "",
395
+ "description_tooltip": null,
396
+ "layout": "IPY_MODEL_875fbcb976bf486092d3c6f483b9e042",
397
+ "placeholder": "​",
398
+ "style": "IPY_MODEL_e2a24c0c90b149508715998b1cf301f7",
399
+ "value": "tokenizer_config.json: 100%"
400
+ }
401
+ },
402
+ "16e2c22fb42e41e8b810c4e659091d37": {
403
+ "model_module": "@jupyter-widgets/controls",
404
+ "model_name": "FloatProgressModel",
405
+ "model_module_version": "1.5.0",
406
+ "state": {
407
+ "_dom_classes": [],
408
+ "_model_module": "@jupyter-widgets/controls",
409
+ "_model_module_version": "1.5.0",
410
+ "_model_name": "FloatProgressModel",
411
+ "_view_count": null,
412
+ "_view_module": "@jupyter-widgets/controls",
413
+ "_view_module_version": "1.5.0",
414
+ "_view_name": "ProgressView",
415
+ "bar_style": "success",
416
+ "description": "",
417
+ "description_tooltip": null,
418
+ "layout": "IPY_MODEL_c730ecd68ae547b1822039b86bd22322",
419
+ "max": 3071,
420
+ "min": 0,
421
+ "orientation": "horizontal",
422
+ "style": "IPY_MODEL_0cd73c61a5e04ae1854eb1f1c4d92317",
423
+ "value": 3071
424
+ }
425
+ },
426
+ "a1f5e814104646cbac5db19fdbcfccb2": {
427
+ "model_module": "@jupyter-widgets/controls",
428
+ "model_name": "HTMLModel",
429
+ "model_module_version": "1.5.0",
430
+ "state": {
431
+ "_dom_classes": [],
432
+ "_model_module": "@jupyter-widgets/controls",
433
+ "_model_module_version": "1.5.0",
434
+ "_model_name": "HTMLModel",
435
+ "_view_count": null,
436
+ "_view_module": "@jupyter-widgets/controls",
437
+ "_view_module_version": "1.5.0",
438
+ "_view_name": "HTMLView",
439
+ "description": "",
440
+ "description_tooltip": null,
441
+ "layout": "IPY_MODEL_c46a9a3e8c7d4560ae71226920e17acd",
442
+ "placeholder": "​",
443
+ "style": "IPY_MODEL_2303aed14ff44e178ed20edf1f2e5359",
444
+ "value": " 3.07k/3.07k [00:00\u0026lt;00:00, 267kB/s]"
445
+ }
446
+ },
447
+ "3186fb1553884a7da72a387f1e00eca5": {
448
+ "model_module": "@jupyter-widgets/base",
449
+ "model_name": "LayoutModel",
450
+ "model_module_version": "1.2.0",
451
+ "state": {
452
+ "_model_module": "@jupyter-widgets/base",
453
+ "_model_module_version": "1.2.0",
454
+ "_model_name": "LayoutModel",
455
+ "_view_count": null,
456
+ "_view_module": "@jupyter-widgets/base",
457
+ "_view_module_version": "1.2.0",
458
+ "_view_name": "LayoutView",
459
+ "align_content": null,
460
+ "align_items": null,
461
+ "align_self": null,
462
+ "border": null,
463
+ "bottom": null,
464
+ "display": null,
465
+ "flex": null,
466
+ "flex_flow": null,
467
+ "grid_area": null,
468
+ "grid_auto_columns": null,
469
+ "grid_auto_flow": null,
470
+ "grid_auto_rows": null,
471
+ "grid_column": null,
472
+ "grid_gap": null,
473
+ "grid_row": null,
474
+ "grid_template_areas": null,
475
+ "grid_template_columns": null,
476
+ "grid_template_rows": null,
477
+ "height": null,
478
+ "justify_content": null,
479
+ "justify_items": null,
480
+ "left": null,
481
+ "margin": null,
482
+ "max_height": null,
483
+ "max_width": null,
484
+ "min_height": null,
485
+ "min_width": null,
486
+ "object_fit": null,
487
+ "object_position": null,
488
+ "order": null,
489
+ "overflow": null,
490
+ "overflow_x": null,
491
+ "overflow_y": null,
492
+ "padding": null,
493
+ "right": null,
494
+ "top": null,
495
+ "visibility": null,
496
+ "width": null
497
+ }
498
+ },
499
+ "875fbcb976bf486092d3c6f483b9e042": {
500
+ "model_module": "@jupyter-widgets/base",
501
+ "model_name": "LayoutModel",
502
+ "model_module_version": "1.2.0",
503
+ "state": {
504
+ "_model_module": "@jupyter-widgets/base",
505
+ "_model_module_version": "1.2.0",
506
+ "_model_name": "LayoutModel",
507
+ "_view_count": null,
508
+ "_view_module": "@jupyter-widgets/base",
509
+ "_view_module_version": "1.2.0",
510
+ "_view_name": "LayoutView",
511
+ "align_content": null,
512
+ "align_items": null,
513
+ "align_self": null,
514
+ "border": null,
515
+ "bottom": null,
516
+ "display": null,
517
+ "flex": null,
518
+ "flex_flow": null,
519
+ "grid_area": null,
520
+ "grid_auto_columns": null,
521
+ "grid_auto_flow": null,
522
+ "grid_auto_rows": null,
523
+ "grid_column": null,
524
+ "grid_gap": null,
525
+ "grid_row": null,
526
+ "grid_template_areas": null,
527
+ "grid_template_columns": null,
528
+ "grid_template_rows": null,
529
+ "height": null,
530
+ "justify_content": null,
531
+ "justify_items": null,
532
+ "left": null,
533
+ "margin": null,
534
+ "max_height": null,
535
+ "max_width": null,
536
+ "min_height": null,
537
+ "min_width": null,
538
+ "object_fit": null,
539
+ "object_position": null,
540
+ "order": null,
541
+ "overflow": null,
542
+ "overflow_x": null,
543
+ "overflow_y": null,
544
+ "padding": null,
545
+ "right": null,
546
+ "top": null,
547
+ "visibility": null,
548
+ "width": null
549
+ }
550
+ },
551
+ "e2a24c0c90b149508715998b1cf301f7": {
552
+ "model_module": "@jupyter-widgets/controls",
553
+ "model_name": "DescriptionStyleModel",
554
+ "model_module_version": "1.5.0",
555
+ "state": {
556
+ "_model_module": "@jupyter-widgets/controls",
557
+ "_model_module_version": "1.5.0",
558
+ "_model_name": "DescriptionStyleModel",
559
+ "_view_count": null,
560
+ "_view_module": "@jupyter-widgets/base",
561
+ "_view_module_version": "1.2.0",
562
+ "_view_name": "StyleView",
563
+ "description_width": ""
564
+ }
565
+ },
566
+ "c730ecd68ae547b1822039b86bd22322": {
567
+ "model_module": "@jupyter-widgets/base",
568
+ "model_name": "LayoutModel",
569
+ "model_module_version": "1.2.0",
570
+ "state": {
571
+ "_model_module": "@jupyter-widgets/base",
572
+ "_model_module_version": "1.2.0",
573
+ "_model_name": "LayoutModel",
574
+ "_view_count": null,
575
+ "_view_module": "@jupyter-widgets/base",
576
+ "_view_module_version": "1.2.0",
577
+ "_view_name": "LayoutView",
578
+ "align_content": null,
579
+ "align_items": null,
580
+ "align_self": null,
581
+ "border": null,
582
+ "bottom": null,
583
+ "display": null,
584
+ "flex": null,
585
+ "flex_flow": null,
586
+ "grid_area": null,
587
+ "grid_auto_columns": null,
588
+ "grid_auto_flow": null,
589
+ "grid_auto_rows": null,
590
+ "grid_column": null,
591
+ "grid_gap": null,
592
+ "grid_row": null,
593
+ "grid_template_areas": null,
594
+ "grid_template_columns": null,
595
+ "grid_template_rows": null,
596
+ "height": null,
597
+ "justify_content": null,
598
+ "justify_items": null,
599
+ "left": null,
600
+ "margin": null,
601
+ "max_height": null,
602
+ "max_width": null,
603
+ "min_height": null,
604
+ "min_width": null,
605
+ "object_fit": null,
606
+ "object_position": null,
607
+ "order": null,
608
+ "overflow": null,
609
+ "overflow_x": null,
610
+ "overflow_y": null,
611
+ "padding": null,
612
+ "right": null,
613
+ "top": null,
614
+ "visibility": null,
615
+ "width": null
616
+ }
617
+ },
618
+ "0cd73c61a5e04ae1854eb1f1c4d92317": {
619
+ "model_module": "@jupyter-widgets/controls",
620
+ "model_name": "ProgressStyleModel",
621
+ "model_module_version": "1.5.0",
622
+ "state": {
623
+ "_model_module": "@jupyter-widgets/controls",
624
+ "_model_module_version": "1.5.0",
625
+ "_model_name": "ProgressStyleModel",
626
+ "_view_count": null,
627
+ "_view_module": "@jupyter-widgets/base",
628
+ "_view_module_version": "1.2.0",
629
+ "_view_name": "StyleView",
630
+ "bar_color": null,
631
+ "description_width": ""
632
+ }
633
+ },
634
+ "c46a9a3e8c7d4560ae71226920e17acd": {
635
+ "model_module": "@jupyter-widgets/base",
636
+ "model_name": "LayoutModel",
637
+ "model_module_version": "1.2.0",
638
+ "state": {
639
+ "_model_module": "@jupyter-widgets/base",
640
+ "_model_module_version": "1.2.0",
641
+ "_model_name": "LayoutModel",
642
+ "_view_count": null,
643
+ "_view_module": "@jupyter-widgets/base",
644
+ "_view_module_version": "1.2.0",
645
+ "_view_name": "LayoutView",
646
+ "align_content": null,
647
+ "align_items": null,
648
+ "align_self": null,
649
+ "border": null,
650
+ "bottom": null,
651
+ "display": null,
652
+ "flex": null,
653
+ "flex_flow": null,
654
+ "grid_area": null,
655
+ "grid_auto_columns": null,
656
+ "grid_auto_flow": null,
657
+ "grid_auto_rows": null,
658
+ "grid_column": null,
659
+ "grid_gap": null,
660
+ "grid_row": null,
661
+ "grid_template_areas": null,
662
+ "grid_template_columns": null,
663
+ "grid_template_rows": null,
664
+ "height": null,
665
+ "justify_content": null,
666
+ "justify_items": null,
667
+ "left": null,
668
+ "margin": null,
669
+ "max_height": null,
670
+ "max_width": null,
671
+ "min_height": null,
672
+ "min_width": null,
673
+ "object_fit": null,
674
+ "object_position": null,
675
+ "order": null,
676
+ "overflow": null,
677
+ "overflow_x": null,
678
+ "overflow_y": null,
679
+ "padding": null,
680
+ "right": null,
681
+ "top": null,
682
+ "visibility": null,
683
+ "width": null
684
+ }
685
+ },
686
+ "2303aed14ff44e178ed20edf1f2e5359": {
687
+ "model_module": "@jupyter-widgets/controls",
688
+ "model_name": "DescriptionStyleModel",
689
+ "model_module_version": "1.5.0",
690
+ "state": {
691
+ "_model_module": "@jupyter-widgets/controls",
692
+ "_model_module_version": "1.5.0",
693
+ "_model_name": "DescriptionStyleModel",
694
+ "_view_count": null,
695
+ "_view_module": "@jupyter-widgets/base",
696
+ "_view_module_version": "1.2.0",
697
+ "_view_name": "StyleView",
698
+ "description_width": ""
699
+ }
700
+ },
701
+ "072e1baca7d64766807df5454dc9e3cc": {
702
+ "model_module": "@jupyter-widgets/controls",
703
+ "model_name": "HBoxModel",
704
+ "model_module_version": "1.5.0",
705
+ "state": {
706
+ "_dom_classes": [],
707
+ "_model_module": "@jupyter-widgets/controls",
708
+ "_model_module_version": "1.5.0",
709
+ "_model_name": "HBoxModel",
710
+ "_view_count": null,
711
+ "_view_module": "@jupyter-widgets/controls",
712
+ "_view_module_version": "1.5.0",
713
+ "_view_name": "HBoxView",
714
+ "box_style": "",
715
+ "children": [
716
+ "IPY_MODEL_6da37a13974c4c3890c7676d194021bc",
717
+ "IPY_MODEL_2f5b6f1af091405287c35c53ad169354",
718
+ "IPY_MODEL_b977fb3e42a14fe1bec47426ae1efded"
719
+ ],
720
+ "layout": "IPY_MODEL_a063adb2cc1c44438d5f631fb16297ae"
721
+ }
722
+ },
723
+ "6da37a13974c4c3890c7676d194021bc": {
724
+ "model_module": "@jupyter-widgets/controls",
725
+ "model_name": "HTMLModel",
726
+ "model_module_version": "1.5.0",
727
+ "state": {
728
+ "_dom_classes": [],
729
+ "_model_module": "@jupyter-widgets/controls",
730
+ "_model_module_version": "1.5.0",
731
+ "_model_name": "HTMLModel",
732
+ "_view_count": null,
733
+ "_view_module": "@jupyter-widgets/controls",
734
+ "_view_module_version": "1.5.0",
735
+ "_view_name": "HTMLView",
736
+ "description": "",
737
+ "description_tooltip": null,
738
+ "layout": "IPY_MODEL_50f86e2ac8444d1986d8d9afe9fcee37",
739
+ "placeholder": "​",
740
+ "style": "IPY_MODEL_da323d8a744a43d8901f19c48b1e1223",
741
+ "value": "tokenizer.json: 100%"
742
+ }
743
+ },
744
+ "2f5b6f1af091405287c35c53ad169354": {
745
+ "model_module": "@jupyter-widgets/controls",
746
+ "model_name": "FloatProgressModel",
747
+ "model_module_version": "1.5.0",
748
+ "state": {
749
+ "_dom_classes": [],
750
+ "_model_module": "@jupyter-widgets/controls",
751
+ "_model_module_version": "1.5.0",
752
+ "_model_name": "FloatProgressModel",
753
+ "_view_count": null,
754
+ "_view_module": "@jupyter-widgets/controls",
755
+ "_view_module_version": "1.5.0",
756
+ "_view_name": "ProgressView",
757
+ "bar_style": "success",
758
+ "description": "",
759
+ "description_tooltip": null,
760
+ "layout": "IPY_MODEL_69afe592335b4d73b51b63e4c56407fc",
761
+ "max": 7031660,
762
+ "min": 0,
763
+ "orientation": "horizontal",
764
+ "style": "IPY_MODEL_f3605ab95cbf4ebda9a678a0788e9682",
765
+ "value": 7031660
766
+ }
767
+ },
768
+ "b977fb3e42a14fe1bec47426ae1efded": {
769
+ "model_module": "@jupyter-widgets/controls",
770
+ "model_name": "HTMLModel",
771
+ "model_module_version": "1.5.0",
772
+ "state": {
773
+ "_dom_classes": [],
774
+ "_model_module": "@jupyter-widgets/controls",
775
+ "_model_module_version": "1.5.0",
776
+ "_model_name": "HTMLModel",
777
+ "_view_count": null,
778
+ "_view_module": "@jupyter-widgets/controls",
779
+ "_view_module_version": "1.5.0",
780
+ "_view_name": "HTMLView",
781
+ "description": "",
782
+ "description_tooltip": null,
783
+ "layout": "IPY_MODEL_7d2023b2a9054a3991983a30fdc6555b",
784
+ "placeholder": "​",
785
+ "style": "IPY_MODEL_17d028b387724317ae9994819a97a3a4",
786
+ "value": " 7.03M/7.03M [00:00\u0026lt;00:00, 28.7MB/s]"
787
+ }
788
+ },
789
+ "a063adb2cc1c44438d5f631fb16297ae": {
790
+ "model_module": "@jupyter-widgets/base",
791
+ "model_name": "LayoutModel",
792
+ "model_module_version": "1.2.0",
793
+ "state": {
794
+ "_model_module": "@jupyter-widgets/base",
795
+ "_model_module_version": "1.2.0",
796
+ "_model_name": "LayoutModel",
797
+ "_view_count": null,
798
+ "_view_module": "@jupyter-widgets/base",
799
+ "_view_module_version": "1.2.0",
800
+ "_view_name": "LayoutView",
801
+ "align_content": null,
802
+ "align_items": null,
803
+ "align_self": null,
804
+ "border": null,
805
+ "bottom": null,
806
+ "display": null,
807
+ "flex": null,
808
+ "flex_flow": null,
809
+ "grid_area": null,
810
+ "grid_auto_columns": null,
811
+ "grid_auto_flow": null,
812
+ "grid_auto_rows": null,
813
+ "grid_column": null,
814
+ "grid_gap": null,
815
+ "grid_row": null,
816
+ "grid_template_areas": null,
817
+ "grid_template_columns": null,
818
+ "grid_template_rows": null,
819
+ "height": null,
820
+ "justify_content": null,
821
+ "justify_items": null,
822
+ "left": null,
823
+ "margin": null,
824
+ "max_height": null,
825
+ "max_width": null,
826
+ "min_height": null,
827
+ "min_width": null,
828
+ "object_fit": null,
829
+ "object_position": null,
830
+ "order": null,
831
+ "overflow": null,
832
+ "overflow_x": null,
833
+ "overflow_y": null,
834
+ "padding": null,
835
+ "right": null,
836
+ "top": null,
837
+ "visibility": null,
838
+ "width": null
839
+ }
840
+ },
841
+ "50f86e2ac8444d1986d8d9afe9fcee37": {
842
+ "model_module": "@jupyter-widgets/base",
843
+ "model_name": "LayoutModel",
844
+ "model_module_version": "1.2.0",
845
+ "state": {
846
+ "_model_module": "@jupyter-widgets/base",
847
+ "_model_module_version": "1.2.0",
848
+ "_model_name": "LayoutModel",
849
+ "_view_count": null,
850
+ "_view_module": "@jupyter-widgets/base",
851
+ "_view_module_version": "1.2.0",
852
+ "_view_name": "LayoutView",
853
+ "align_content": null,
854
+ "align_items": null,
855
+ "align_self": null,
856
+ "border": null,
857
+ "bottom": null,
858
+ "display": null,
859
+ "flex": null,
860
+ "flex_flow": null,
861
+ "grid_area": null,
862
+ "grid_auto_columns": null,
863
+ "grid_auto_flow": null,
864
+ "grid_auto_rows": null,
865
+ "grid_column": null,
866
+ "grid_gap": null,
867
+ "grid_row": null,
868
+ "grid_template_areas": null,
869
+ "grid_template_columns": null,
870
+ "grid_template_rows": null,
871
+ "height": null,
872
+ "justify_content": null,
873
+ "justify_items": null,
874
+ "left": null,
875
+ "margin": null,
876
+ "max_height": null,
877
+ "max_width": null,
878
+ "min_height": null,
879
+ "min_width": null,
880
+ "object_fit": null,
881
+ "object_position": null,
882
+ "order": null,
883
+ "overflow": null,
884
+ "overflow_x": null,
885
+ "overflow_y": null,
886
+ "padding": null,
887
+ "right": null,
888
+ "top": null,
889
+ "visibility": null,
890
+ "width": null
891
+ }
892
+ },
893
+ "da323d8a744a43d8901f19c48b1e1223": {
894
+ "model_module": "@jupyter-widgets/controls",
895
+ "model_name": "DescriptionStyleModel",
896
+ "model_module_version": "1.5.0",
897
+ "state": {
898
+ "_model_module": "@jupyter-widgets/controls",
899
+ "_model_module_version": "1.5.0",
900
+ "_model_name": "DescriptionStyleModel",
901
+ "_view_count": null,
902
+ "_view_module": "@jupyter-widgets/base",
903
+ "_view_module_version": "1.2.0",
904
+ "_view_name": "StyleView",
905
+ "description_width": ""
906
+ }
907
+ },
908
+ "69afe592335b4d73b51b63e4c56407fc": {
909
+ "model_module": "@jupyter-widgets/base",
910
+ "model_name": "LayoutModel",
911
+ "model_module_version": "1.2.0",
912
+ "state": {
913
+ "_model_module": "@jupyter-widgets/base",
914
+ "_model_module_version": "1.2.0",
915
+ "_model_name": "LayoutModel",
916
+ "_view_count": null,
917
+ "_view_module": "@jupyter-widgets/base",
918
+ "_view_module_version": "1.2.0",
919
+ "_view_name": "LayoutView",
920
+ "align_content": null,
921
+ "align_items": null,
922
+ "align_self": null,
923
+ "border": null,
924
+ "bottom": null,
925
+ "display": null,
926
+ "flex": null,
927
+ "flex_flow": null,
928
+ "grid_area": null,
929
+ "grid_auto_columns": null,
930
+ "grid_auto_flow": null,
931
+ "grid_auto_rows": null,
932
+ "grid_column": null,
933
+ "grid_gap": null,
934
+ "grid_row": null,
935
+ "grid_template_areas": null,
936
+ "grid_template_columns": null,
937
+ "grid_template_rows": null,
938
+ "height": null,
939
+ "justify_content": null,
940
+ "justify_items": null,
941
+ "left": null,
942
+ "margin": null,
943
+ "max_height": null,
944
+ "max_width": null,
945
+ "min_height": null,
946
+ "min_width": null,
947
+ "object_fit": null,
948
+ "object_position": null,
949
+ "order": null,
950
+ "overflow": null,
951
+ "overflow_x": null,
952
+ "overflow_y": null,
953
+ "padding": null,
954
+ "right": null,
955
+ "top": null,
956
+ "visibility": null,
957
+ "width": null
958
+ }
959
+ },
960
+ "f3605ab95cbf4ebda9a678a0788e9682": {
961
+ "model_module": "@jupyter-widgets/controls",
962
+ "model_name": "ProgressStyleModel",
963
+ "model_module_version": "1.5.0",
964
+ "state": {
965
+ "_model_module": "@jupyter-widgets/controls",
966
+ "_model_module_version": "1.5.0",
967
+ "_model_name": "ProgressStyleModel",
968
+ "_view_count": null,
969
+ "_view_module": "@jupyter-widgets/base",
970
+ "_view_module_version": "1.2.0",
971
+ "_view_name": "StyleView",
972
+ "bar_color": null,
973
+ "description_width": ""
974
+ }
975
+ },
976
+ "7d2023b2a9054a3991983a30fdc6555b": {
977
+ "model_module": "@jupyter-widgets/base",
978
+ "model_name": "LayoutModel",
979
+ "model_module_version": "1.2.0",
980
+ "state": {
981
+ "_model_module": "@jupyter-widgets/base",
982
+ "_model_module_version": "1.2.0",
983
+ "_model_name": "LayoutModel",
984
+ "_view_count": null,
985
+ "_view_module": "@jupyter-widgets/base",
986
+ "_view_module_version": "1.2.0",
987
+ "_view_name": "LayoutView",
988
+ "align_content": null,
989
+ "align_items": null,
990
+ "align_self": null,
991
+ "border": null,
992
+ "bottom": null,
993
+ "display": null,
994
+ "flex": null,
995
+ "flex_flow": null,
996
+ "grid_area": null,
997
+ "grid_auto_columns": null,
998
+ "grid_auto_flow": null,
999
+ "grid_auto_rows": null,
1000
+ "grid_column": null,
1001
+ "grid_gap": null,
1002
+ "grid_row": null,
1003
+ "grid_template_areas": null,
1004
+ "grid_template_columns": null,
1005
+ "grid_template_rows": null,
1006
+ "height": null,
1007
+ "justify_content": null,
1008
+ "justify_items": null,
1009
+ "left": null,
1010
+ "margin": null,
1011
+ "max_height": null,
1012
+ "max_width": null,
1013
+ "min_height": null,
1014
+ "min_width": null,
1015
+ "object_fit": null,
1016
+ "object_position": null,
1017
+ "order": null,
1018
+ "overflow": null,
1019
+ "overflow_x": null,
1020
+ "overflow_y": null,
1021
+ "padding": null,
1022
+ "right": null,
1023
+ "top": null,
1024
+ "visibility": null,
1025
+ "width": null
1026
+ }
1027
+ },
1028
+ "17d028b387724317ae9994819a97a3a4": {
1029
+ "model_module": "@jupyter-widgets/controls",
1030
+ "model_name": "DescriptionStyleModel",
1031
+ "model_module_version": "1.5.0",
1032
+ "state": {
1033
+ "_model_module": "@jupyter-widgets/controls",
1034
+ "_model_module_version": "1.5.0",
1035
+ "_model_name": "DescriptionStyleModel",
1036
+ "_view_count": null,
1037
+ "_view_module": "@jupyter-widgets/base",
1038
+ "_view_module_version": "1.2.0",
1039
+ "_view_name": "StyleView",
1040
+ "description_width": ""
1041
+ }
1042
+ }
1043
+ }
1044
+ }
1045
+ },
1046
+ "cells": [
1047
+ {
1048
+ "cell_type": "markdown",
1049
+ "source": [
1050
+ "#Install dependencies"
1051
+ ],
1052
+ "metadata": {
1053
+ "id": "39AMoCOa1ckc"
1054
+ }
1055
+ },
1056
+ {
1057
+ "metadata": {
1058
+ "id": "VoHxuLPu7s37"
1059
+ },
1060
+ "cell_type": "code",
1061
+ "source": [],
1062
+ "outputs": [],
1063
+ "execution_count": null
1064
+ },
1065
+ {
1066
+ "cell_type": "code",
1067
+ "source": [
1068
+ "!pip install ai-edge-litert"
1069
+ ],
1070
+ "metadata": {
1071
+ "id": "43tAeO0AZ7zp",
1072
+ "colab": {
1073
+ "base_uri": "https://localhost:8080/"
1074
+ },
1075
+ "outputId": "76cd0d1b-7de2-4519-c0ae-1b9e6ee37653"
1076
+ },
1077
+ "execution_count": 1,
1078
+ "outputs": [
1079
+ {
1080
+ "output_type": "stream",
1081
+ "name": "stdout",
1082
+ "text": []
1083
+ }
1084
+ ]
1085
+ },
1086
+ {
1087
+ "cell_type": "code",
1088
+ "source": [
1089
+ "from collections.abc import Sequence\n",
1090
+ "import sys\n",
1091
+ "from ai_edge_litert import interpreter as interpreter_lib\n",
1092
+ "import numpy as np\n",
1093
+ "from transformers import AutoTokenizer"
1094
+ ],
1095
+ "metadata": {
1096
+ "id": "i6PMkMVBPr1p"
1097
+ },
1098
+ "execution_count": 2,
1099
+ "outputs": []
1100
+ },
1101
+ {
1102
+ "cell_type": "markdown",
1103
+ "source": [
1104
+ "# Download model files"
1105
+ ],
1106
+ "metadata": {
1107
+ "id": "K5okZCTgYpUd"
1108
+ }
1109
+ },
1110
+ {
1111
+ "cell_type": "code",
1112
+ "source": [
1113
+ "from huggingface_hub import hf_hub_download\n",
1114
+ "\n",
1115
+ "model_path = hf_hub_download(\n",
1116
+ " repo_id=\"litert-community/DeepSeek-R1-Distill-Qwen-1.5B\",\n",
1117
+ " filename=\"DeepSeek-R1-Distill-Qwen-1.5B_seq128_q8_ekv1280.tflite\",\n",
1118
+ ")"
1119
+ ],
1120
+ "metadata": {
1121
+ "id": "3t47HAG2tvc3",
1122
+ "colab": {
1123
+ "base_uri": "https://localhost:8080/",
1124
+ "height": 49,
1125
+ "referenced_widgets": [
1126
+ "47cd47140dbb4e28a4f31d5632bfe82d",
1127
+ "7c0ddb1e0e3145f08ccb0c32b02c562f",
1128
+ "85c490db972b4d659caad513359a6700",
1129
+ "d61e96ae08d84414a638dd592f13fb18",
1130
+ "9e7f4734aa034e4aa5207b8a2498ee02",
1131
+ "df08ba8056fb47cb969e132087987e68",
1132
+ "470febc3af8348ef8611255e88401229",
1133
+ "39cedca11f574c01808acdc1be9aa68d",
1134
+ "62bd6d393ca74193bded59a8ebd0a749",
1135
+ "475c5c4fc6eb404180d7b69d75f797ea",
1136
+ "b815fc17c9ee4913b5cb452653ff1af9"
1137
+ ]
1138
+ },
1139
+ "outputId": "d1d8ed1a-5ec6-4121-9d3c-fada487fc8ed"
1140
+ },
1141
+ "execution_count": 3,
1142
+ "outputs": []
1143
+ },
1144
+ {
1145
+ "cell_type": "markdown",
1146
+ "source": [
1147
+ "# Create LiteRT interpreter and tokenizer"
1148
+ ],
1149
+ "metadata": {
1150
+ "id": "n5Xa4s6XhWqk"
1151
+ }
1152
+ },
1153
+ {
1154
+ "cell_type": "code",
1155
+ "source": [
1156
+ "interpreter = interpreter_lib.InterpreterWithCustomOps(\n",
1157
+ " custom_op_registerers=[\"pywrap_genai_ops.GenAIOpsRegisterer\"],\n",
1158
+ " model_path=model_path,\n",
1159
+ " num_threads=2,\n",
1160
+ " experimental_default_delegate_latest_features=True,\n",
1161
+ ")\n",
1162
+ "tokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B\")"
1163
+ ],
1164
+ "metadata": {
1165
+ "id": "Rvdn3EIZhaQn",
1166
+ "colab": {
1167
+ "base_uri": "https://localhost:8080/",
1168
+ "height": 81,
1169
+ "referenced_widgets": [
1170
+ "8cac4d03da1044d6adb8b62752ed6775",
1171
+ "a201091e2f9b4f6c8a7d780dde854134",
1172
+ "16e2c22fb42e41e8b810c4e659091d37",
1173
+ "a1f5e814104646cbac5db19fdbcfccb2",
1174
+ "3186fb1553884a7da72a387f1e00eca5",
1175
+ "875fbcb976bf486092d3c6f483b9e042",
1176
+ "e2a24c0c90b149508715998b1cf301f7",
1177
+ "c730ecd68ae547b1822039b86bd22322",
1178
+ "0cd73c61a5e04ae1854eb1f1c4d92317",
1179
+ "c46a9a3e8c7d4560ae71226920e17acd",
1180
+ "2303aed14ff44e178ed20edf1f2e5359",
1181
+ "072e1baca7d64766807df5454dc9e3cc",
1182
+ "6da37a13974c4c3890c7676d194021bc",
1183
+ "2f5b6f1af091405287c35c53ad169354",
1184
+ "b977fb3e42a14fe1bec47426ae1efded",
1185
+ "a063adb2cc1c44438d5f631fb16297ae",
1186
+ "50f86e2ac8444d1986d8d9afe9fcee37",
1187
+ "da323d8a744a43d8901f19c48b1e1223",
1188
+ "69afe592335b4d73b51b63e4c56407fc",
1189
+ "f3605ab95cbf4ebda9a678a0788e9682",
1190
+ "7d2023b2a9054a3991983a30fdc6555b",
1191
+ "17d028b387724317ae9994819a97a3a4"
1192
+ ]
1193
+ },
1194
+ "outputId": "e05a5944-5312-41c4-e38e-7e26a921e63c"
1195
+ },
1196
+ "execution_count": 4,
1197
+ "outputs": []
1198
+ },
1199
+ {
1200
+ "cell_type": "markdown",
1201
+ "source": [
1202
+ "# Create pipeline with LiteRT models"
1203
+ ],
1204
+ "metadata": {
1205
+ "id": "AM6rDABTXt2F"
1206
+ }
1207
+ },
1208
+ {
1209
+ "cell_type": "code",
1210
+ "source": [
1211
+ "class LiteRTLlmPipeline:\n",
1212
+ "\n",
1213
+ " def __init__(self, interpreter, tokenizer):\n",
1214
+ " \"\"\"Initializes the pipeline.\"\"\"\n",
1215
+ " self._interpreter = interpreter\n",
1216
+ " self._tokenizer = tokenizer\n",
1217
+ "\n",
1218
+ " self._prefill_runner = None\n",
1219
+ " self._decode_runner = self._interpreter.get_signature_runner(\"decode\")\n",
1220
+ "\n",
1221
+ " def _init_prefill_runner(self, num_input_tokens: int):\n",
1222
+ " \"\"\"Initializes all the variables related to the prefill runner.\n",
1223
+ "\n",
1224
+ " This method initializes the following variables:\n",
1225
+ " - self._prefill_runner: The prefill runner based on the input size.\n",
1226
+ " - self._max_seq_len: The maximum sequence length supported by the model.\n",
1227
+ " - self._max_kv_cache_seq_len: The maximum sequence length supported by the\n",
1228
+ " KV cache.\n",
1229
+ "\n",
1230
+ " Args:\n",
1231
+ " num_input_tokens: The number of input tokens.\n",
1232
+ " \"\"\"\n",
1233
+ " if not self._interpreter:\n",
1234
+ " raise ValueError(\"Interpreter is not initialized.\")\n",
1235
+ "\n",
1236
+ " # Prefill runner related variables will be initialized in `predict_text` and\n",
1237
+ " # `compute_log_likelihood`.\n",
1238
+ " self._prefill_runner = self._get_prefill_runner(num_input_tokens)\n",
1239
+ " # input_token_shape has shape (batch, max_seq_len)\n",
1240
+ " input_token_shape = self._prefill_runner.get_input_details()[\"tokens\"][\n",
1241
+ " \"shape\"\n",
1242
+ " ]\n",
1243
+ " if len(input_token_shape) == 1:\n",
1244
+ " self._max_seq_len = input_token_shape[0]\n",
1245
+ " else:\n",
1246
+ " self._max_seq_len = input_token_shape[1]\n",
1247
+ "\n",
1248
+ " # kv cache input has shape [batch=1, seq_len, num_heads, dim].\n",
1249
+ " kv_cache_shape = self._prefill_runner.get_input_details()[\"kv_cache_k_0\"][\n",
1250
+ " \"shape\"\n",
1251
+ " ]\n",
1252
+ " self._max_kv_cache_seq_len = kv_cache_shape[1]\n",
1253
+ "\n",
1254
+ " def _init_kv_cache(self) -\u003e dict[str, np.ndarray]:\n",
1255
+ " if self._prefill_runner is None:\n",
1256
+ " raise ValueError(\"Prefill runner is not initialized.\")\n",
1257
+ " kv_cache = {}\n",
1258
+ " for input_key in self._prefill_runner.get_input_details().keys():\n",
1259
+ " if \"kv_cache\" in input_key:\n",
1260
+ " kv_cache[input_key] = np.zeros(\n",
1261
+ " self._prefill_runner.get_input_details()[input_key][\"shape\"],\n",
1262
+ " dtype=np.float32,\n",
1263
+ " )\n",
1264
+ " kv_cache[input_key] = np.zeros(\n",
1265
+ " self._prefill_runner.get_input_details()[input_key][\"shape\"],\n",
1266
+ " dtype=np.float32,\n",
1267
+ " )\n",
1268
+ " return kv_cache\n",
1269
+ "\n",
1270
+ " def _get_prefill_runner(self, num_input_tokens: int):\n",
1271
+ " \"\"\"Gets the prefill runner with the best suitable input size.\n",
1272
+ "\n",
1273
+ " Args:\n",
1274
+ " num_input_tokens: The number of input tokens.\n",
1275
+ "\n",
1276
+ " Returns:\n",
1277
+ " The prefill runner with the smallest input size.\n",
1278
+ " \"\"\"\n",
1279
+ " best_signature = None\n",
1280
+ " delta = sys.maxsize\n",
1281
+ " max_prefill_len = -1\n",
1282
+ " for key in self._interpreter.get_signature_list().keys():\n",
1283
+ " if \"prefill\" not in key:\n",
1284
+ " continue\n",
1285
+ " input_pos = self._interpreter.get_signature_runner(\n",
1286
+ " key\n",
1287
+ " ).get_input_details()[\"input_pos\"]\n",
1288
+ " # input_pos[\"shape\"] has shape (max_seq_len, )\n",
1289
+ " seq_size = input_pos[\"shape\"][0]\n",
1290
+ " max_prefill_len = max(max_prefill_len, seq_size)\n",
1291
+ " if num_input_tokens \u003c= seq_size and seq_size - num_input_tokens \u003c delta:\n",
1292
+ " delta = seq_size - num_input_tokens\n",
1293
+ " best_signature = key\n",
1294
+ " if best_signature is None:\n",
1295
+ " raise ValueError(\n",
1296
+ " \"The largest prefill length supported is %d, but we have %d number of\"\n",
1297
+ " \" input tokens\" % (max_prefill_len, num_input_tokens)\n",
1298
+ " )\n",
1299
+ " return self._interpreter.get_signature_runner(best_signature)\n",
1300
+ "\n",
1301
+ " def _run_prefill(\n",
1302
+ " self,\n",
1303
+ " prefill_token_ids: Sequence[int],\n",
1304
+ " ) -\u003e dict[str, np.ndarray]:\n",
1305
+ " \"\"\"Runs prefill and returns the kv cache.\n",
1306
+ "\n",
1307
+ " Args:\n",
1308
+ " prefill_token_ids: The token ids of the prefill input.\n",
1309
+ "\n",
1310
+ " Returns:\n",
1311
+ " The updated kv cache.\n",
1312
+ " \"\"\"\n",
1313
+ " if not self._prefill_runner:\n",
1314
+ " raise ValueError(\"Prefill runner is not initialized.\")\n",
1315
+ " prefill_token_length = len(prefill_token_ids)\n",
1316
+ " if prefill_token_length == 0:\n",
1317
+ " return self._init_kv_cache()\n",
1318
+ "\n",
1319
+ " # Prepare the input to be [1, max_seq_len].\n",
1320
+ " input_token_ids = [0] * self._max_seq_len\n",
1321
+ " input_token_ids[:prefill_token_length] = prefill_token_ids\n",
1322
+ " input_token_ids = np.asarray(input_token_ids, dtype=np.int32)\n",
1323
+ " input_token_ids = np.expand_dims(input_token_ids, axis=0)\n",
1324
+ "\n",
1325
+ " # Prepare the input position to be [max_seq_len].\n",
1326
+ " input_pos = [0] * self._max_seq_len\n",
1327
+ " input_pos[:prefill_token_length] = range(prefill_token_length)\n",
1328
+ " input_pos = np.asarray(input_pos, dtype=np.int32)\n",
1329
+ "\n",
1330
+ " # Initialize kv cache.\n",
1331
+ " prefill_inputs = self._init_kv_cache()\n",
1332
+ " prefill_inputs.update({\n",
1333
+ " \"tokens\": input_token_ids,\n",
1334
+ " \"input_pos\": input_pos,\n",
1335
+ " })\n",
1336
+ " prefill_outputs = self._prefill_runner(**prefill_inputs)\n",
1337
+ " if \"logits\" in prefill_outputs:\n",
1338
+ " # Prefill outputs includes logits and kv cache. We only output kv cache.\n",
1339
+ " prefill_outputs.pop(\"logits\")\n",
1340
+ "\n",
1341
+ " return prefill_outputs\n",
1342
+ "\n",
1343
+ " def _greedy_sampler(self, logits: np.ndarray) -\u003e int:\n",
1344
+ " return int(np.argmax(logits))\n",
1345
+ "\n",
1346
+ " def _run_decode(\n",
1347
+ " self,\n",
1348
+ " start_pos: int,\n",
1349
+ " start_token_id: int,\n",
1350
+ " kv_cache: dict[str, np.ndarray],\n",
1351
+ " max_decode_steps: int,\n",
1352
+ " ) -\u003e str:\n",
1353
+ " \"\"\"Runs decode and outputs the token ids from greedy sampler.\n",
1354
+ "\n",
1355
+ " Args:\n",
1356
+ " start_pos: The position of the first token of the decode input.\n",
1357
+ " start_token_id: The token id of the first token of the decode input.\n",
1358
+ " kv_cache: The kv cache from the prefill.\n",
1359
+ " max_decode_steps: The max decode steps.\n",
1360
+ "\n",
1361
+ " Returns:\n",
1362
+ " The token ids from the greedy sampler.\n",
1363
+ " \"\"\"\n",
1364
+ " next_pos = start_pos\n",
1365
+ " next_token = start_token_id\n",
1366
+ " decode_text = []\n",
1367
+ " decode_inputs = kv_cache\n",
1368
+ "\n",
1369
+ " for _ in range(max_decode_steps):\n",
1370
+ " decode_inputs.update({\n",
1371
+ " \"tokens\": np.array([[next_token]], dtype=np.int32),\n",
1372
+ " \"input_pos\": np.array([next_pos], dtype=np.int32),\n",
1373
+ " })\n",
1374
+ " decode_outputs = self._decode_runner(**decode_inputs)\n",
1375
+ " # Output logits has shape (batch=1, 1, vocab_size). We only take the first\n",
1376
+ " # element.\n",
1377
+ " logits = decode_outputs.pop(\"logits\")[0][0]\n",
1378
+ " next_token = self._greedy_sampler(logits)\n",
1379
+ " if next_token == self._tokenizer.eos_token_id:\n",
1380
+ " break\n",
1381
+ " decode_text.append(\n",
1382
+ " self._tokenizer.decode(next_token, skip_special_tokens=False)\n",
1383
+ " )\n",
1384
+ " print(decode_text[-1], end=\"\", flush=True)\n",
1385
+ " # Decode outputs includes logits and kv cache. We already poped out\n",
1386
+ " # logits, so the rest is kv cache. We pass the updated kv cache as input\n",
1387
+ " # to the next decode step.\n",
1388
+ " decode_inputs = decode_outputs\n",
1389
+ " next_pos += 1\n",
1390
+ "\n",
1391
+ " print() # print a new line at the end.\n",
1392
+ " return \"\".join(decode_text)\n",
1393
+ "\n",
1394
+ " def generate(self, prompt: str, max_decode_steps: int | None = None) -\u003e str:\n",
1395
+ " token_ids = self._tokenizer.encode(\n",
1396
+ " f\"<|begin▁of▁sentence|><|User|>{prompt}<|Assistant|><think>\\n\"\n",
1397
+ " )\n",
1398
+ " # Initialize the prefill runner with the suitable input size.\n",
1399
+ " self._init_prefill_runner(len(token_ids))\n",
1400
+ "\n",
1401
+ " # Run prefill.\n",
1402
+ " # Prefill up to the seond to the last token of the prompt, because the last\n",
1403
+ " # token of the prompt will be used to bootstrap decode.\n",
1404
+ " prefill_token_length = len(token_ids) - 1\n",
1405
+ "\n",
1406
+ " print(\"Running prefill\")\n",
1407
+ " kv_cache = self._run_prefill(token_ids[:prefill_token_length])\n",
1408
+ " # Run decode.\n",
1409
+ " print(\"Running decode\")\n",
1410
+ " actual_max_decode_steps = (\n",
1411
+ " self._max_kv_cache_seq_len - prefill_token_length - 1\n",
1412
+ " )\n",
1413
+ " if max_decode_steps is not None:\n",
1414
+ " actual_max_decode_steps = min(actual_max_decode_steps, max_decode_steps)\n",
1415
+ " decode_text = self._run_decode(\n",
1416
+ " prefill_token_length,\n",
1417
+ " token_ids[prefill_token_length],\n",
1418
+ " kv_cache,\n",
1419
+ " actual_max_decode_steps,\n",
1420
+ " )\n",
1421
+ " return decode_text"
1422
+ ],
1423
+ "metadata": {
1424
+ "id": "UBSGrHrM4ANm"
1425
+ },
1426
+ "execution_count": 15,
1427
+ "outputs": []
1428
+ },
1429
+ {
1430
+ "cell_type": "markdown",
1431
+ "source": [
1432
+ "# Generate text from model"
1433
+ ],
1434
+ "metadata": {
1435
+ "id": "dASKx_JtYXwe"
1436
+ }
1437
+ },
1438
+ {
1439
+ "cell_type": "code",
1440
+ "source": [
1441
+ "# Disclaimer: Model performance demonstrated with the Python API in this notebook is not representative of performance on a local device.\n",
1442
+ "pipeline = LiteRTLlmPipeline(interpreter, tokenizer)"
1443
+ ],
1444
+ "metadata": {
1445
+ "id": "AZhlDQWg61AL"
1446
+ },
1447
+ "execution_count": 16,
1448
+ "outputs": []
1449
+ },
1450
+ {
1451
+ "cell_type": "code",
1452
+ "source": [
1453
+ "prompt = \"What is the capital of France?\"\n",
1454
+ "output = pipeline.generate(prompt, max_decode_steps=None)"
1455
+ ],
1456
+ "metadata": {
1457
+ "id": "wT9BIiATkjzL"
1458
+ },
1459
+ "execution_count": null,
1460
+ "outputs": []
1461
+ }
1462
+ ]
1463
+ }