Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / chunked /nltk_chunking /_quantization /chunk_16.txt

Ahmadzei's picture

added 3 more tables for large emb model

5fa1a76 over 1 year ago

history blame contribute delete

931 Bytes

	The most popular setups, as well as inference kernels they support are:
	\| Kernel \| Number of codebooks \| Codebook size, bits \| Notation \| Accuracy \| Speedup \| Fast GPU inference \| Fast CPU inference \|
	\|---\|---------------------\|---------------------\|----------\|-------------\|-------------\|--------------------\|--------------------\|
	\| Triton \| K \| N \| KxN \| - \| Up to ~0.7x \| ✅ \| ❌ \|
	\| CUDA \| 1 \| 16 \| 1x16 \| Best \| Up to ~1.3x \| ✅ \| ❌ \|
	\| CUDA \| 2 \| 8 \| 2x8 \| OK \| Up to ~3.0x \| ✅ \| ❌ \|
	\| Numba \| K \| 8 \| Kx8 \| Good \| Up to ~4.0x \| ❌ \| ✅ \|
	AWQ

	Try AWQ quantization with this notebook!