📑 Table of Contents

New Python MCP Engine Unlocks 5 Search Sources

📅 · 📁 AI Applications · 👁 14 views · ⏱️ 9 min read
💡 Developer rewrites TypeScript search tool in Python, adding Google and Baidu support plus CAPTCHA handling.

A new open-source project has emerged to solve critical limitations in current AI agent search tools. Developer Duan Shiwen has released a Python-based rewrite of the popular web-search-mcp framework.

This new tool, named 'search-mcp-craft-agent', addresses significant gaps in the original TypeScript version. It introduces support for Google, Baidu, and Yahoo alongside existing engines. The update also features robust CAPTCHA handling and improved browser management.

Overcoming TypeScript Limitations

The original web-search-mcp project, built on TypeScript, gained traction with over 900 stars on GitHub. However, developers soon encountered friction points that hindered complex workflows. The primary issue was the lack of support for major global search engines like Google and China's Baidu.

Many Western developers rely on Bing or DuckDuckGo, but these engines often lack the depth required for specialized research. The absence of Google limits access to specific indexing capabilities. Furthermore, the original tool struggled with automated verification challenges.

When a search engine triggered a CAPTCHA, the entire process would fail immediately. This rigidity made the tool unreliable for production environments where consistency is key. The page extraction logic was also tightly coupled with the search flow, limiting flexibility for custom data processing needs.

Duan Shiwen identified these bottlenecks during active development. Rather than patching the existing codebase, he chose a complete rewrite. This approach allowed for architectural improvements that were impossible within the constraints of the original JavaScript ecosystem.

Key Technical Improvements

The transition from TypeScript to Python brings several tangible benefits for AI developers. Python remains the dominant language in the artificial intelligence sector. Most large language model (LLM) frameworks and libraries are native to Python.

By aligning the search tool with this ecosystem, integration becomes seamless. Developers no longer need to manage cross-language bridges or complex compilation steps. The new tool requires Python 3.10 or higher, ensuring compatibility with modern async programming patterns.

Here are the core technical enhancements in the new release:

  • Expanded Engine Support: Adds Google, Baidu, and Yahoo to the existing Bing, Brave, and DuckDuckGo options.
  • CAPTCHA Resilience: Implements automatic detection and pauses execution for manual user verification via popup.
  • Isolated Browser Instances: Uses per-engine independent processes instead of shared instances to prevent state conflicts.
  • Global Queue Locking: Manages concurrent requests efficiently to avoid rate-limiting bans.
  • Decoupled Extraction: Separates search retrieval from page content parsing for modular workflow design.

These changes transform the tool from a simple utility into a robust infrastructure component. The isolation of browser instances is particularly crucial. Shared instances in the previous version often led to cookie contamination across different search queries.

Enhanced CAPTCHA Handling Strategy

One of the most significant hurdles in automated web scraping is bot detection. Major search engines employ sophisticated algorithms to identify non-human traffic. The original web-search-mcp had zero tolerance for these challenges.

Any encounter with a CAPTCHA resulted in an immediate crash. This fragility rendered the tool useless for sustained operations. The new Python implementation introduces a proactive handling mechanism.

The system now detects when a verification challenge arises. Instead of failing, it triggers a local popup window. This allows a human operator to manually solve the puzzle without stopping the broader script.

This hybrid approach balances automation with reliability. It ensures that long-running research tasks do not terminate unexpectedly. For businesses relying on AI agents for market intelligence, this feature is invaluable.

It reduces the need for expensive proxy services or third-party solving APIs. The manual intervention step acts as a safety valve. It maintains the integrity of the session while preserving the efficiency of the automated pipeline.

Browser Management and Performance

Efficient resource management is critical when running multiple search agents simultaneously. The original TypeScript version utilized a shared browser instance. This architecture created contention issues when handling parallel requests.

The new Python version adopts a more granular approach. It implements a global queue lock to manage request timing. This prevents overwhelming target servers and reduces the risk of IP bans.

Furthermore, each search engine operates within its own independent process. This isolation ensures that a failure in one engine does not cascade to others. If Google blocks a request, Bing can continue functioning normally.

This modular design enhances overall system stability. It allows developers to configure specific timeout settings for each engine. Such flexibility is essential for optimizing performance across different geographic regions.

Developers can now tailor their search strategies based on specific needs. For example, Baidu might require different headers than Google. The decoupled architecture supports these nuanced configurations effortlessly.

Industry Context and Implications

The rise of AI Agents has increased demand for reliable information retrieval tools. Large Language Models (LLMs) excel at reasoning but lack real-time data access. They depend on external tools like Model Context Protocol (MCP) servers to fetch current information.

Current MCP solutions often lag behind the rapid evolution of web technologies. Many are built on older frameworks or lack comprehensive engine support. This gap creates a bottleneck for advanced AI applications.

Tools like search-mcp-craft-agent fill this void by providing enterprise-grade reliability. They enable developers to build sophisticated research assistants that can navigate complex web environments.

For Western companies, access to diverse search sources is strategic. While Google dominates globally, regional engines offer unique insights. Integrating these sources provides a competitive edge in data analysis.

The shift towards Python-centric tools also reflects broader industry trends. As AI infrastructure consolidates around Python, auxiliary tools must adapt. This alignment simplifies deployment and maintenance for DevOps teams.

Gogo's Take

  • 🔥 Why This Matters: This tool democratizes access to multi-engine search for AI agents. By supporting Google and Baidu, it unlocks vast amounts of previously inaccessible data for LLMs. The CAPTCHA handling feature significantly reduces operational friction, making autonomous agents more viable for real-world business use cases.
  • ⚠️ Limitations & Risks: Relying on manual CAPTCHA solving introduces latency. If无人值守 (unattended) operation is required, this solution may not suffice without additional automation layers. Additionally, scraping policies vary by region; developers must ensure compliance with local laws and terms of service for each engine.
  • 💡 Actionable Advice: Developers building AI research agents should test this Python fork immediately. Compare its success rate against the original TypeScript version in your specific workflow. Implement the global queue locking feature to protect your IP reputation when scaling up search volumes.