Archiving Instagram posts

September 9, 2021
Peter Chan

Stanford Libraries’ Web Archiving Program uses Archive-It as the preferred solution for curation and archiving of topical web archive collections. It is the best of available options for 1) data capture efficacy and 2) support for our curatorial workflow. Websites crawled with Archive-It can be accessioned to the Stanford Digital Repository (SDR), made available through the Stanford Web Archiving Portal (SWAP), and made discoverable in Searchworks using a dedicated web archiving workflow developed by the Libraries over the past few years (see https://consul.stanford.edu/display/WARC/Web+Archiving).

However, there are some sites which cannot be archived properly by Archive-It, including Instagram, Facebook, and sites created using the popular Wix service. While some of these sites can be captured  in high fidelity using other services such as Webrecorder and Archive Web.Page,  our current web archiving workflow cannot handle files created using those services. 

A recent collaboration among Stanford University Press, Webrecorder, and DLSS has enabled websites captured using the Webrecorder toolset to be stored in the SDR and made available via the Archive Web.Page interface. This new capability enables us to create new workflow providing access to Instagram sites. The main differences between our existing web archiving workflow (using Archive-it) and the newly created workflow are as follows:

The main differences between our existing web archiving workflow (using Archive-it) and the newly created workflow are as follows:

 

Existing workflow

New workflow

Web archive tool

Archive-It (subscription)

Archive Web.Page (free)

Social media, Wix sites

Not properly archived

Archived in high fidelity

Curation support

Extensive

Minimal

Archiving process

Mostly automated

Mostly manual

Thumbnails in SearchWorks

Generated by system

Created manually

Accession to SDR

Tailor made for web archive

Generic process for image and file

Viewing environment

SWAP (Stanford Web Archiving Portal)

Archive Web.Page interface

     

Here is an example of archived Instagram posts using the new workflow : https://searchworks.stanford.edu/catalog?f%5Bcollection%5D%5B%5D=jz413tt7854

Ideally, Archive-It would enhance their system to properly archive Instagram, Facebook and sites created by Wix service, and our existing web archiving workflow would also support files created for those sites. Until then, we can use the new workflow to archive Instagram, Facebook and sites created using the Wix service, which are critical to our collections.

I would like to thank Ilya Kreymer (Webrecorder), Jasmine Mulliken (Stanford University Press),  Andrew Berger (DLSS), Josh Schneider (University Archives) and Jessica Cebra (Metadata) for making the new workflow possible.

 

accessibilityaccessprivsarrow-circle-rightaskus-chataskus-librarianbarsblogsclosecoffeecomputercomputersulcontactsconversationcopierelectricaloutleteventsexternal-linkfacebook-circlegroupstudyhoursindividualinterlibrarynewsnextoffcampusopenlateoutdoorpeoplepolicypreviousprinterprojectsquietreservesscannersearchstudysupportingtabletourstwitter-circleworking