ssyok commited on
Commit
bcf089d
·
verified ·
1 Parent(s): e34567f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -55
README.md CHANGED
@@ -1,53 +1,19 @@
1
  ---
2
  license: mit
3
  pipeline_tag: text-generation
4
- tags: [ONNX, DML, ONNXRuntime, phi3, nlp, conversational, custom_code]
 
 
 
 
 
 
 
5
  inference: false
 
 
6
  ---
7
-
8
- # Phi-3 Mini-4K-Instruct ONNX DirectML models
9
-
10
- <!-- Provide a quick summary of what the model is/does. -->
11
- This repository hosts the optimized versions of [Phi-3-mini-4k-instruct](https://aka.ms/phi3-mini-4k-instruct) to accelerate inference with ONNX Runtime.
12
-
13
- Phi-3 Mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-2 - synthetic data and filtered websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family, and the mini version comes in two variants: 4K and 128K which is the context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
14
-
15
- Optimized Phi-3 Mini models are published here in [ONNX](https://onnx.ai) format to run with [ONNX Runtime](https://onnxruntime.ai/) on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.
16
-
17
- [DirectML](https://aka.ms/directml) support lets developers bring hardware acceleration to Windows devices at scale across AMD, Intel, and NVIDIA GPUs. Along with DirectML, ONNX Runtime provides cross platform support for Phi-3 Mini across a range of devices for CPU, GPU, and mobile.
18
-
19
- To easily get started with Phi-3, you can use our newly introduced ONNX Runtime Generate() API. See [here](https://aka.ms/generate-tutorial) for instructions on how to run it.
20
-
21
- ## ONNX Models
22
-
23
- The optimized configurations we have added:
24
-
25
- - ONNX model for int4 DML: ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using [AWQ](https://arxiv.org/abs/2306.00978).
26
-
27
-
28
- ## Hardware Supported
29
-
30
- The models are tested on:
31
- - GPU SKU: RTX 4090 (DirectML)
32
-
33
- Minimum Configuration Required:
34
- - Windows: DirectX 12-capable GPU and a minimum of 4GB of combined RAM
35
- - CUDA: NVIDIA GPU with [Compute Capability](https://developer.nvidia.com/cuda-gpus) >= 7.0
36
-
37
- ### Model Description
38
-
39
- - **Developed by:** Microsoft
40
- - **Model type:** ONNX
41
- - **Language(s) (NLP):** Python, C, C++
42
- - **License:** MIT
43
- - **Model Description:** This is a conversion of the Phi-3 Mini-4K-Instruct model for ONNX Runtime inference.
44
-
45
- ## Additional Details
46
- - [**ONNX Runtime Optimizations Blog Link**](https://aka.ms/phi3-optimizations)
47
- - [**Phi-3 Model Blog Link**](https://aka.ms/phi3blog-april)
48
- - [**Phi-3 Model Card**]( https://aka.ms/phi3-mini-4k-instruct)
49
- - [**Phi-3 Technical Report**](https://aka.ms/phi3-tech-report)
50
-
51
 
52
  ## Performance Metrics
53
 
@@ -57,16 +23,16 @@ We measured the performance of DirectML on AMD Ryzen 9 7940HS /w Radeon 78
57
 
58
  | Prompt Length | Generation Length | Average Throughput (tps) |
59
  |---------------------------|-------------------|-----------------------------|
60
- | 128 | 128 | 53.46686 |
61
- | 128 | 256 | 53.11233 |
62
- | 128 | 512 | 57.45816 |
63
- | 128 | 1024 | 33.44713 |
64
- | 256 | 128 | 76.50182 |
65
- | 256 | 256 | 66.68873 |
66
- | 256 | 512 | 70.83862 |
67
- | 256 | 1024 | 34.64715 |
68
- | 512 | 128 | 85.10079 |
69
- | 512 | 256 | 68.64049 |
70
  | 512 | 512 | - |
71
  | 512 | 1024 | - |
72
  | 1024 | 128 | - |
 
1
  ---
2
  license: mit
3
  pipeline_tag: text-generation
4
+ tags:
5
+ - ONNX
6
+ - DML
7
+ - ONNXRuntime
8
+ - phi3
9
+ - nlp
10
+ - conversational
11
+ - custom_code
12
  inference: false
13
+ language:
14
+ - en
15
  ---
16
+ # EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-directml
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Performance Metrics
19
 
 
23
 
24
  | Prompt Length | Generation Length | Average Throughput (tps) |
25
  |---------------------------|-------------------|-----------------------------|
26
+ | 128 | 128 | - |
27
+ | 128 | 256 | - |
28
+ | 128 | 512 | - |
29
+ | 128 | 1024 | - |
30
+ | 256 | 128 | - |
31
+ | 256 | 256 | - |
32
+ | 256 | 512 | - |
33
+ | 256 | 1024 | - |
34
+ | 512 | 128 | - |
35
+ | 512 | 256 | - |
36
  | 512 | 512 | - |
37
  | 512 | 1024 | - |
38
  | 1024 | 128 | - |