Skip to main content

An Awesome List for getting started with web archiving

2.6k
GitHub Stars
185
Curated Resources
6
Categories
16 hours ago
Last Refreshed
Training/DocumentationResources for Web PublishersTools & SoftwareCommunity ResourcesWeb Archiving Service ProvidersPublic Data

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me acquisition resources from awesome-web-archiving"

Installation instructions →

What's inside

Tools & Software

  • ArchiveBoxAcquisition

    A tool which maintains an additive archive from RSS feeds, bookmarks, and links using wget, Chrome headless, and other methods (formerly

  • archivenowAcquisition

    A

  • ArchiveSparkAnalysis

    An Apache Spark framework (not only) for Web Archives that enables easy data processing, extraction as well as derivation.

  • Archives Research Compute HubAnalysis

    Web application for distributed compute analysis of Archive-It web archive collections.

  • Archives Unleashed NotebooksAnalysis

    Notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.

  • Archives Unleashed ToolkitAnalysis

    Archives Unleashed Toolkit (AUT) is an open-source platform for analyzing web archives with Apache Spark.

Web Archiving Service Providers

Resources for Web Publishers

Community Resources

Public Data

Showing a sample of 185 resources. View the full list on GitHub →