--- language: en tags: - embedding - transformers - search - e-commerce - conversational-search - semantic-search license: mit pipeline_tag: feature-extraction --- # VectorPath SearchMap: Conversational E-commerce Search Embedding Model ## Model Description SearchMap is a specialized embedding model designed to change search by making it more conversational and intuitive. We test out this hypothesis by creating a model suitable for ecommerce search. Fine-tuned on the Stella Embed 400M v5 base model, it excels at understanding natural language queries and matching them with relevant products. ## Key Features - Optimized for conversational e-commerce queries - Handles complex, natural language search intents - Supports multi-attribute product search - Efficient 1024-dimensional embeddings (configurable up to 8192) - Specialized for product and hotel search scenarios ## Quick Start Try out the model in our interactive [Colab Demo](https://colab.research.google.com/drive/1wUQlWgL5R65orhw6MFChxitabqTKIGRu?usp=sharing)! ## Model Details - Base Model: Stella Embed 400M v5 - Embedding Dimensions: Configurable (512, 768, 1024, 2048, 4096, 6144, 8192) - Training Data: 100,000+ e-commerce products across 32 categories - License: MIT - Framework: PyTorch / Sentence Transformers ## Usage ### Using Sentence Transformers ```python # Install required packages !pip install -U torch==2.5.1 transformers==4.44.2 sentence-transformers==2.7.0 xformers==0.0.28.post3 from sentence_transformers import SentenceTransformer # Initialize the model model = SentenceTransformer('vectopath/SearchMap_Preview', trust_remote_code=True) # Encode queries query = "A treat my dog and I can eat together" query_embedding = model.encode(query) # Encode products product_description = "Organic peanut butter dog treats, safe for human consumption..." product_embedding = model.encode(product_description) ``` ### Using with FAISS for Vector Search ```python import numpy as np import faiss # Create FAISS index embedding_dimension = 1024 # or your chosen dimension index = faiss.IndexFlatL2(embedding_dimension) # Add product embeddings product_embeddings = model.encode(product_descriptions, show_progress_bar=True) index.add(np.array(product_embeddings).astype('float32')) # Search query_embedding = model.encode([query]) distances, indices = index.search( np.array(query_embedding).astype('float32'), k=10 ) ``` ### Example Search Queries The model excels at understanding natural language queries like: - "A treat my dog and I can eat together" - "Lightweight waterproof hiking backpack for summer trails" - "Eco-friendly kitchen gadgets for a small apartment" - "Comfortable shoes for standing all day at work" - "Cereal for my 4 year old son that likes to miss breakfast" ## Performance and Limitations ### Evaluation The model's evaluation metrics are available on the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) - The model is currently by far the best embedding model under 1B parameters size and very easy to run locally on a small GPU due to it's memory size - The model also is No 1. by a far margin on the [SemRel24STS](https://huggingface.co/datasets/SemRel/SemRel2024) task with an accuracy of 81.12% beating Google Gemini embedding model (second place) 73.14% (as at 30th March 2025). SemRel24STS evaluates the ability of systems to measure the semantic relatedness between two sentences over 14 different languages. - We noticed the model does exceptionally well with legal and news retrieval and similarity task from the MTEB leaderboard ### Strengths - Excellent at understanding conversational and natural language queries - Strong performance in e-commerce and hotel search scenarios - Handles complex multi-attribute queries - Efficient computation with configurable embedding dimensions ### Current Limitations - May not fully prioritize weighted terms in queries - Limited handling of slang and colloquial language - Regional language variations might need fine-tuning ## Training Details The model was trained using: - Supervised learning with Sentence Transformers - 100,000+ product dataset across 32 categories - AI-generated conversational search queries - Positive and negative product examples for contrast learning ## Intended Use This model is designed for: - E-commerce product search and recommendations - Hotel and accommodation search - Product catalog vectorization - Semantic similarity matching - Query understanding and intent detection ## Citation If you use this model in your research, please cite: ```bibtex @misc{vectorpath2025searchmap, title={SearchMap: Conversational E-commerce Search Embedding Model}, author={VectorPath Research Team}, year={2025}, publisher={Hugging Face}, journal={HuggingFace Model Hub}, } ``` ## Contact and Community - Discord Community: [Join our Discord](https://discord.gg/gXvVfqGD) - GitHub Issues: Report bugs and feature requests - Interactive Demo: [Try it on Colab](https://colab.research.google.com/drive/1wUQlWgL5R65orhw6MFChxitabqTKIGRu?usp=sharing) ## License This model is released under the MIT License. See the LICENSE file for more details.