Skip to main content

A collection of awesome web crawler,spider in different languages

7.2k
GitHub Stars
101
Curated Resources
14
Categories
4 hours ago
Last Refreshed
PythonJavaC#JavaScriptPHPC++CRubyRustRErlangPerlGoScala

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me c# resources from awesome-crawler"

Installation instructions →

What's inside

C#

  • Abot

    C# web crawler built for speed and flexibility.

  • ccrawler

    Built in C# 3.5 version. it contains a simple extension of web content categorizer, which can separate between the web page depending on their content.

  • DotnetSpider

    This is a cross platfrom, ligth spider develop by C#.

  • Hawk

    Advanced Crawler and ETL tool written in C#/WPF.

  • Infinity Crawler

    A simple but powerful web crawler library in C#.

  • SimpleCrawler

    Simple spider base on mutithreading, regluar expression.

Java

  • ACHE Crawler

    An easy to use web crawler for domain-specific search.

  • anthelion

    A plugin for Apache Nutch to crawl semantic annotations within HTML pages.

  • Apache Nutch

    A plugin for Apache Nutch to crawl semantic annotations within HTML pages.

  • Crawler4j

    Simple and lightweight web crawler.

  • Gecco

    A easy to use lightweight web crawler

  • Heritrix3

    Extensible, web-scale, archival-quality web crawler project.

Go

  • ants-go

    A open source, distributed, restful crawler engine in golang.

  • colly

    Fast and Elegant Scraping Framework for Gophers.

  • creeper

    The Next Generation Crawler Framework (Go).

  • Dataflow kit

    Extract structured data from web pages. Web sites scraping.

  • dht

    BitTorrent DHT Protocol && DHT Spider.

  • ferret

    Declarative web scraping.

Python

  • aspider

    An async web scraping micro-framework based on asyncio.

  • brownant

    A lightweight web data extracting framework.

  • CoCrawler

    A versatile web crawler built using modern tools and concurrency.

  • cola

    A distributed crawling framework.

  • crawley

    Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

  • Demiurge

    PyQuery-based scraping micro-framework.

Ruby

  • Cobweb

    Web crawler with very flexible crawling options, standalone or using sidekiq.

  • mechanize

    Automated web interaction & crawling.

  • Nokogiri

    A Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.

  • RubyRetriever

    RubyRetriever is a Web Crawler, Scraper & File Harvester.

  • Spidr

    Spider a site, multiple domains, certain links or infinitely.

  • upton

    A batteries-included framework for easy web-scraping. Just add CSS(Or do more).

JavaScript

  • crawlee

    A web scraping and browser automation library for Node.js that helps you build reliable crawlers. Fast.

  • headless-chrome-crawler

    Headless Chrome crawls with jQuery support

  • js-crawler

    Web crawler for Node.JS, both HTTP and HTTPS are supported.

  • node-crawler

    Node-crawler has clean,simple api.

  • node-osmosis

    HTML/XML parser and web scraper for Node.js.

  • scrape-it

    A Node.js scraper for humans.

Scala

  • crawler

    Scala DSL for web crawling.

  • ferrit

    Ferrit is a web crawler service written in Scala using Akka, Spray and Cassandra.

  • scrala

    Scala crawler(spider) framework, inspired by scrapy.

Rust

  • crawler

    A gRPC web indexer turbo charged for performance.

  • spider

    The fastest web crawler and indexer.

Showing a sample of 101 resources. View the full list on GitHub →