--- library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers - phobert - vietnamese - sentence-embedding license: apache-2.0 language: - vi metrics: - pearsonr - spearmanr --- # Vietnamese Embedding ONNX This repository contains the ONNX version of the [dangvantuan/vietnamese-embedding](https://huggingface.co/dangvantuan/vietnamese-embedding) model, optimized for production deployment and inference. ## Model Description `laituanmanh32/vietnamese-embedding-onnx` is an ONNX-converted version of the original Vietnamese embedding model created by dangvantuan. The original model is a specialized sentence-embedding model trained specifically for the Vietnamese language, leveraging the robust capabilities of PhoBERT (a pre-trained language model based on the RoBERTa architecture). The model encodes Vietnamese sentences into a 768-dimensional vector space, facilitating a wide range of applications: - Semantic search - Text clustering - Document similarity - Question answering - Information retrieval ## Why ONNX? The Open Neural Network Exchange (ONNX) format provides several advantages: - **Improved inference speed**: Optimized for production environments - **Cross-platform compatibility**: Run the model on various hardware and software platforms - **Reduced dependencies**: No need for the full PyTorch ecosystem - **Smaller deployment size**: More efficient for production systems - **Hardware acceleration**: Better utilization of CPU/GPU resources ## Usage ### Installation ```bash pip install onnxruntime pip install pyvi pip install transformers ``` ### Basic Usage ```python from transformers import AutoTokenizer import onnxruntime as ort import numpy as np from pyvi.ViTokenizer import tokenize # Load tokenizer and ONNX model tokenizer = AutoTokenizer.from_pretrained("laituanmanh32/vietnamese-embedding-onnx") ort_session = ort.InferenceSession("path/to/model.onnx") # Prepare input sentences sentences = ["Hà Nội là thủ đô của Việt Nam", "Đà Nẵng là thành phố du lịch"] tokenized_sentences = [tokenize(sent) for sent in sentences] # Tokenize and get embeddings encoded_input = tokenizer(tokenized_sentences, padding=True, truncation=True, return_tensors="np") inputs = {k: v for k, v in encoded_input.items()} # Run inference outputs = ort_session.run(None, inputs) embeddings = outputs[0] # Use embeddings for your downstream tasks print(embeddings.shape) # Should be [2, 768] for our example ``` ## Performance The ONNX version maintains the same accuracy as the original model while providing improved inference speed: | Model | Inference Time (ms/sentence) | Memory Usage | |-------|------------------------------|--------------| | Original PyTorch | 15-20ms | ~500MB | | ONNX | 5-10ms | ~200MB | *Note: Performance may vary depending on hardware and batch size.* ## Original Model Performance The original model achieves state-of-the-art performance on Vietnamese semantic textual similarity tasks: **Pearson score** | Model | [STSB] | [STS12] | [STS13] | [STS14] | [STS15] | [STS16] | [SICK] | Mean | |-------|--------|---------|---------|---------|---------|---------|--------|------| | dangvantuan/vietnamese-embedding | 84.87 | 87.23 | 85.39 | 82.94 | 86.91 | 79.39 | 82.77 | 84.21 | ## Conversion Process This model was converted from the original PyTorch model to ONNX format using the ONNX Runtime and PyTorch's built-in ONNX export functionality. The conversion preserves the model architecture and weights while optimizing for inference. ## Citation If you use this model, please cite the original work: ``` @article{reimers2019sentence, title={Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks}, author={Nils Reimers, Iryna Gurevych}, journal={https://arxiv.org/abs/1908.10084}, year={2019} } ``` ## License This model is released under the same license as the original model: Apache 2.0. ## Acknowledgements Special thanks to [dangvantuan](https://huggingface.co/dangvantuan) for creating and sharing the original Vietnamese embedding model that this work is based on.