File size: 762 Bytes

ff6048b

---
tags:
- image-captioning
- deep-learning
- pytorch
- encoder-decoder
- vision
---

# 🖼️ Image Captioning Model

This is a deep learning-based **image captioning model** trained using a **CNN Encoder + LSTM Decoder** architecture. The model generates captions for input images based on visual features extracted by a Convolutional Neural Network (CNN).

## 📌 Model Details
- **Model Type**: Image Captioning  
- **Architecture**: CNN Encoder + LSTM Decoder  
- **Framework**: PyTorch  
- **Input**: Image (`.jpg`, `.png`, etc.)  
- **Output**: Generated caption (text)  
- **Vocabulary**: Pre-trained vocabulary file  

## 🚀 How to Use
### **1️⃣ Install Dependencies**
```bash
pip install torch torchvision transformers huggingface_hub pickle5