Python utilities for reading WARC and CDX files and converting WARC files to CDX files, developed by David Bern. No documentation.
Go utilities for working with WARC files, developed by Kevin Bullaughey. No documentation.
Python library for converting on-disk directories of web files into WARC files, developed by Ilya Kreymer. Includes a brief usage guide.
Python utilities for WARC validation, summarization, filtering, compression, conversion from ARC format, and indexing, that were under development by Hanzo Archives with funding from IIPC. Includes a brief usage guide.
Java utilities for working with WARC files, collaboratively maintained by members of the IIPC. No documentation.
Python software providing Wayback-like access and optional archiving proxy functionality for live web content, developed by Ilya Kreymer. Includes enhancements for higher-fidelity replay of complex dynamic websites, and it is natively Memento-compliant.
Python software for Windows and OS X for local Wayback-like access to archived web content, developed by Ilya Kreymer.
Java/Scala software built on Spark for web archive analysis, developed by Helge Holzmann and Vinay Goel. The processing pipeline leverages CDX indices to determine what subset of a larger corpus of WARC files should actually be ingested for data extraction. Includes a brief usage guide. Compatible with Jupyter.
Java software built on Hadoop, Pig, and SQLite for web archive analysis, developed by Andreas Paepcke. Data extracted from WebBase or WARC files using Pig is stored in and queried from a SQLite database. Users perform analyses using a spreadsheet interface overlay. Includes a setup and brief usage guide.
Java software leveraging the webarchive-discovery indexer to provide keyword searching, faceting, and trend analysis (akin to the Google Ngram Viewer) in an integrated user interface, developed by the British Library (UK Web Archive). Includes a setup guide.
Java software built on Hadoop, HBase, and Spark for web archive analysis, developed by Milad Gholami and Jimmy Lin. W/ARC files must be ingested into HBase before processing can be carried out. This data store can additionally serve as a back-end for Open Wayback. A virtual machine setup using Virtual Box and Vagrant is available. Includes documentation.
Blacklight and Ruby-on-Rails software leveraging a fork of the webarchive-discovery indexer to provide keyword searching and faceting in an integrated user interface, developed by the Web Archives for Historical Research Group.
Java software leveraging MySQL and Tomcat to provide a local web service for web archive exploration, developed by the University of Maryland ADAPT team. The local web service allows URL string searches and drilling down into the details of individual archived objects. Includes documentation.