awesome-crawler

A collection of awesome web crawler,spider in different languages

7.3k

GitHub Stars

101

Curated Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me c# resources from awesome-crawler"

Abot
C# web crawler built for speed and flexibility.
ccrawler
Built in C# 3.5 version. it contains a simple extension of web content categorizer, which can separate between the web page depending on their content.
DotnetSpider
This is a cross platfrom, ligth spider develop by C#.
Hawk
Advanced Crawler and ETL tool written in C#/WPF.
Infinity Crawler
A simple but powerful web crawler library in C#.
SimpleCrawler
Simple spider base on mutithreading, regluar expression.

ACHE Crawler
An easy to use web crawler for domain-specific search.
anthelion
A plugin for Apache Nutch to crawl semantic annotations within HTML pages.
Apache Nutch
Highly extensible, highly scalable web crawler for production environment.
Crawler4j
Simple and lightweight web crawler.
Gecco
A easy to use lightweight web crawler
Heritrix3
Extensible, web-scale, archival-quality web crawler project.

aspider
An async web scraping micro-framework based on asyncio.
brownant
A lightweight web data extracting framework.
CoCrawler
A versatile web crawler built using modern tools and concurrency.
cola
A distributed crawling framework.
crawley
Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.
Demiurge
PyQuery-based scraping micro-framework.

Cobweb
Web crawler with very flexible crawling options, standalone or using sidekiq.
mechanize
Automated web interaction & crawling.
Nokogiri
A Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.
RubyRetriever
RubyRetriever is a Web Crawler, Scraper & File Harvester.
Spidr
Spider a site, multiple domains, certain links or infinitely.
upton
A batteries-included framework for easy web-scraping. Just add CSS(Or do more).

crawlee
A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast.
headless-chrome-crawler
Headless Chrome crawls with jQuery support
js-crawler
Web crawler for Node.JS, both HTTP and HTTPS are supported.
node-crawler
Node-crawler has clean,simple api.
node-osmosis
HTML/XML parser and web scraper for Node.js.
scrape-it
A Node.js scraper for humans.

crawler
Scala DSL for web crawling.
ferrit
Ferrit is a web crawler service written in Scala using Akka, Spray and Cassandra.
scrala
Scala crawler(spider) framework, inspired by scrapy.

Showing a sample of 101 resources. View the full list on GitHub →