MULTI-ENGINE WEB SCRAPER
One engine isn't enough
A client needed reliable data from 50+ websites — news, product pages, forums, APIs. The problem: no single scraping tool handles all of them. Some sites block headless browsers. Some require JavaScript rendering. Some have bot detection that only a managed service can bypass. The client was maintaining three separate scripts with three output formats. Every new site meant figuring out which script to modify.
What I built
A scraping layer with one REST API and three engines behind it. Send a URL, get structured data back. The system picks the right engine: Crawl4AI for JavaScript-heavy pages, Scrapy for high-volume bulk crawls, Firecrawl for sites with aggressive bot detection. Every result returns in the same schema regardless of which engine ran.
The routing logic
Engine selection works on three levels. URL pattern matching for known sites, which the client configures. Content-type detection from a lightweight HEAD request. And a fallback chain — if Crawl4AI fails, it retries with Scrapy, then Firecrawl. This handles about 95% of sites without manual intervention. The client can force a specific engine through the API for edge cases they've already identified.
Where it is now
Deployed on Render, running production scraping workloads. The client's downstream pipeline doesn't know which engine executed. It just gets clean, normalized data.