--- title: Web Search MCP emoji: 🔎 colorFrom: red colorTo: green sdk: gradio sdk_version: 5.36.2 app_file: app.py pinned: false short_description: Search and extract web content for LLM ingestion --- # Web Search MCP Server A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from web pages and news articles. ## Features - **Dual search modes**: - **General Search**: Get diverse results from blogs, documentation, articles, and more - **News Search**: Find fresh news articles and breaking stories from news sources - **Real-time web search**: Search for any topic with up-to-date results - **Content extraction**: Automatically extracts main article content, removing ads and boilerplate - **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse - **Structured output**: Returns formatted content with metadata (title, source, date, URL) - **Flexible results**: Control the number of results (1-20) ## Prerequisites 1. **Serper API Key**: Sign up at [serper.dev](https://serper.dev) to get your API key 2. **Python 3.8+**: Ensure you have Python installed 3. **MCP-compatible LLM client**: Such as Claude Desktop, Cursor, or any MCP-enabled application ## Installation 1. Clone or download this repository 2. Install dependencies: ```bash pip install -r requirements.txt ``` Or install manually: ```bash pip install "gradio[mcp]" httpx trafilatura python-dateutil limits ``` 3. Set your Serper API key: ```bash export SERPER_API_KEY="your-api-key-here" ``` ## Usage ### Starting the MCP Server ```bash python app_mcp.py ``` The server will start on `http://localhost:7860` with the MCP endpoint at: ``` http://localhost:7860/gradio_api/mcp/sse ``` ### Connecting to LLM Clients #### Claude Desktop Add to your `claude_desktop_config.json`: ```json { "mcpServers": { "web-search": { "command": "python", "args": ["/path/to/app_mcp.py"], "env": { "SERPER_API_KEY": "your-api-key-here" } } } } ``` #### Direct URL Connection For clients that support URL-based MCP servers: 1. Start the server: `python app_mcp.py` 2. Connect to: `http://localhost:7860/gradio_api/mcp/sse` ## Tool Documentation ### `search_web` Function **Purpose**: Search the web for information or fresh news and extract content. **Parameters**: - `query` (str, **REQUIRED**): The search query - Examples: "OpenAI news", "climate change 2024", "python tutorial" - `num_results` (int, **OPTIONAL**): Number of results to fetch - Default: 4 - Range: 1-20 - More results provide more context but take longer - `search_type` (str, **OPTIONAL**): Type of search to perform - Default: "search" (general web search) - Options: "search" or "news" - Use "news" for fresh, time-sensitive news articles - Use "search" for general information, documentation, tutorials **Returns**: Formatted text containing: - Summary of extraction results - For each article: - Title - Source and date - URL - Extracted main content **When to use each search type**: - **Use "news" mode for**: - Breaking news or very recent events - Time-sensitive information ("today", "this week") - Current affairs and latest developments - Press releases and announcements - **Use "search" mode for**: - General information and research - Technical documentation or tutorials - Historical information - Diverse perspectives from various sources - How-to guides and explanations **Example Usage in LLM**: ``` # News mode examples "Search for breaking news about OpenAI" -> uses news mode "Find today's stock market updates" -> uses news mode "Get latest climate change developments" -> uses news mode # Search mode examples (default) "Search for Python programming tutorials" -> uses search mode "Find information about machine learning algorithms" -> uses search mode "Research historical data about climate change" -> uses search mode ``` ## Error Handling The tool handles various error scenarios: - Missing API key: Clear error message with setup instructions - Rate limiting: Informs when limit is exceeded - Failed extractions: Reports which articles couldn't be extracted - Network errors: Graceful error messages ## Testing You can test the server manually: 1. Open `http://localhost:7860` in your browser 2. Enter a search query 3. Adjust the number of results 4. Click "Search" to see the extracted content ## Tips for LLM Usage 1. **Choose the right search type**: Use "news" for fresh, breaking news; use "search" for general information 2. **Be specific with queries**: More specific queries yield better results 3. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research 4. **Check dates**: The tool shows article dates for temporal context 5. **Follow up**: Use the extracted content to ask follow-up questions ## Limitations - Rate limited to 200 requests per hour - Extraction quality depends on website structure - Some websites may block automated access - News mode focuses on recent articles from news sources - Search mode provides diverse results but may include older content ## Troubleshooting 1. **"SERPER_API_KEY is not set"**: Ensure the environment variable is exported 2. **Rate limit errors**: Wait before making more requests 3. **No content extracted**: Some websites block scrapers; try different queries 4. **Connection errors**: Check your internet connection and firewall settings