Update README.md
Browse files
README.md
CHANGED
@@ -1,53 +1,19 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
pipeline_tag: text-generation
|
4 |
-
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
inference: false
|
|
|
|
|
6 |
---
|
7 |
-
|
8 |
-
# Phi-3 Mini-4K-Instruct ONNX DirectML models
|
9 |
-
|
10 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
-
This repository hosts the optimized versions of [Phi-3-mini-4k-instruct](https://aka.ms/phi3-mini-4k-instruct) to accelerate inference with ONNX Runtime.
|
12 |
-
|
13 |
-
Phi-3 Mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-2 - synthetic data and filtered websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family, and the mini version comes in two variants: 4K and 128K which is the context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
|
14 |
-
|
15 |
-
Optimized Phi-3 Mini models are published here in [ONNX](https://onnx.ai) format to run with [ONNX Runtime](https://onnxruntime.ai/) on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.
|
16 |
-
|
17 |
-
[DirectML](https://aka.ms/directml) support lets developers bring hardware acceleration to Windows devices at scale across AMD, Intel, and NVIDIA GPUs. Along with DirectML, ONNX Runtime provides cross platform support for Phi-3 Mini across a range of devices for CPU, GPU, and mobile.
|
18 |
-
|
19 |
-
To easily get started with Phi-3, you can use our newly introduced ONNX Runtime Generate() API. See [here](https://aka.ms/generate-tutorial) for instructions on how to run it.
|
20 |
-
|
21 |
-
## ONNX Models
|
22 |
-
|
23 |
-
The optimized configurations we have added:
|
24 |
-
|
25 |
-
- ONNX model for int4 DML: ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using [AWQ](https://arxiv.org/abs/2306.00978).
|
26 |
-
|
27 |
-
|
28 |
-
## Hardware Supported
|
29 |
-
|
30 |
-
The models are tested on:
|
31 |
-
- GPU SKU: RTX 4090 (DirectML)
|
32 |
-
|
33 |
-
Minimum Configuration Required:
|
34 |
-
- Windows: DirectX 12-capable GPU and a minimum of 4GB of combined RAM
|
35 |
-
- CUDA: NVIDIA GPU with [Compute Capability](https://developer.nvidia.com/cuda-gpus) >= 7.0
|
36 |
-
|
37 |
-
### Model Description
|
38 |
-
|
39 |
-
- **Developed by:** Microsoft
|
40 |
-
- **Model type:** ONNX
|
41 |
-
- **Language(s) (NLP):** Python, C, C++
|
42 |
-
- **License:** MIT
|
43 |
-
- **Model Description:** This is a conversion of the Phi-3 Mini-4K-Instruct model for ONNX Runtime inference.
|
44 |
-
|
45 |
-
## Additional Details
|
46 |
-
- [**ONNX Runtime Optimizations Blog Link**](https://aka.ms/phi3-optimizations)
|
47 |
-
- [**Phi-3 Model Blog Link**](https://aka.ms/phi3blog-april)
|
48 |
-
- [**Phi-3 Model Card**]( https://aka.ms/phi3-mini-4k-instruct)
|
49 |
-
- [**Phi-3 Technical Report**](https://aka.ms/phi3-tech-report)
|
50 |
-
|
51 |
|
52 |
## Performance Metrics
|
53 |
|
@@ -57,16 +23,16 @@ We measured the performance of DirectML on AMD Ryzen 9 7940HS /w Radeon 78
|
|
57 |
|
58 |
| Prompt Length | Generation Length | Average Throughput (tps) |
|
59 |
|---------------------------|-------------------|-----------------------------|
|
60 |
-
| 128 | 128 |
|
61 |
-
| 128 | 256 |
|
62 |
-
| 128 | 512 |
|
63 |
-
| 128 | 1024 |
|
64 |
-
| 256 | 128 |
|
65 |
-
| 256 | 256 |
|
66 |
-
| 256 | 512 |
|
67 |
-
| 256 | 1024 |
|
68 |
-
| 512 | 128 |
|
69 |
-
| 512 | 256 |
|
70 |
| 512 | 512 | - |
|
71 |
| 512 | 1024 | - |
|
72 |
| 1024 | 128 | - |
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
pipeline_tag: text-generation
|
4 |
+
tags:
|
5 |
+
- ONNX
|
6 |
+
- DML
|
7 |
+
- ONNXRuntime
|
8 |
+
- phi3
|
9 |
+
- nlp
|
10 |
+
- conversational
|
11 |
+
- custom_code
|
12 |
inference: false
|
13 |
+
language:
|
14 |
+
- en
|
15 |
---
|
16 |
+
# EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-directml
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
## Performance Metrics
|
19 |
|
|
|
23 |
|
24 |
| Prompt Length | Generation Length | Average Throughput (tps) |
|
25 |
|---------------------------|-------------------|-----------------------------|
|
26 |
+
| 128 | 128 | - |
|
27 |
+
| 128 | 256 | - |
|
28 |
+
| 128 | 512 | - |
|
29 |
+
| 128 | 1024 | - |
|
30 |
+
| 256 | 128 | - |
|
31 |
+
| 256 | 256 | - |
|
32 |
+
| 256 | 512 | - |
|
33 |
+
| 256 | 1024 | - |
|
34 |
+
| 512 | 128 | - |
|
35 |
+
| 512 | 256 | - |
|
36 |
| 512 | 512 | - |
|
37 |
| 512 | 1024 | - |
|
38 |
| 1024 | 128 | - |
|