You are here

Web archiving

Mark Matienzo

We are pleased to announce that Mark Matienzo is joining  Stanford Libraries as of September 19, 2016 as our Collaboration & Interoperability Architect. Mark will be joining Stanford from DPLA (the Digital Public Library of America) where he currently serves as the Director of Technology. He has previously worked  as an archivist, a digital library software developer, and the technical architect for the ArchivesSpace project, at institutions including DPLA, the Yale University Library, and The New York Public Library.

logo of the International Internet Preservation Consortium

In keeping with shallow tradition, it's taken me a few weeks to collect my thoughts on the recently-concluded IIPC General Assembly and Web Archiving Conference, hosted this year by the National and University Library of Iceland. In the wake of last year's meeting, I speculated on what developments in web archiving we might together effect in the year ahead (now behind). Nearly a year later, that conceit provides a convenient jumping-off point for reflecting on how it all went, where we might go from here, and the tremendous amount of work to do in our one remaining collective month before the anniversary of that post. :)

headshot of Niels Brügger

Dovetailing our recent announcement of documentation of resources for research using web archives, we will be visited next month by an individual who has done much to advance web archives as materials of scholarly interest and exploration. Niels Brügger is Professor of Internet Studies and Digital Humanities at Aarhus University in Denmark, where he also heads the Centre for Internet Studies and NetLab. On Thursday, April 7th he will present, Digital Humanities, Web History, Web Archives, and Web Research Infrastructure &emdash; between close and distant reading, followed by discussion. Additional event details may be found on the Stanford Event Calendar page. We hope you'll join us!

screenshot from Exploring the Canadian Political Interest Group and Political Parties Web Sphere

Since our collaboration with political science researchers using web archives to understand the 2014 U.S. congressional elections, we've seen (and, hopefully, helped foster) growing interest in web archives as primary source material. This trend parallels a similar refocusing by other web archiving programs toward enhancing access services and facilitating research use. The maturity and the variety of these efforts, as well as the accumulating body of resulting research, provide an expanding list of references with which to orient and entice prospective researchers to the potential of working with web archives.

#ethics @ #webarc15

A welcome complement to the lately growing number of web archiving-specific events, the inaugural Web Archives: Capture, Curate, Analyze conference (tweet stream) brought together an eclectic crowd of researchers, instructors, students, archivists, librarians, developers, and others interested in web archiving. A novel mixture of institutions was also represented - some active principally through IIPC, many more associated with the SAA Web Archiving Roundtable and/or Archive-It Partner communities, and still others who I'd not yet encountered in these more established, practitioner-centric fora.

Echoing the sentiments of other participants, I was impressed and inspired both by the diversity of perspectives and the excitement for moving web archiving forward. As befitting such a group, the schedule and hallway conversations crossed a wide array of topics. Running through it all, though, questions of ethics seemed to be a persistent subject. I'll highlight three areas of ethical concern that stood out for me.

logo graphic appearing on the "WorldWideWeb SLAC Home Page" in 1993

The world's first websites were built for very different rendering and navigation interfaces than the comparatively advanced browsers available today. Thanks to the work of web archivists (e.g., CERN, SLAC), we can celebrate the incongruity of accessing some of these ancient websites using modern browsers. While a traditional goal of web archiving has been to preserve the "canonical" user experience of a website, this has been persistently impaired by (among other challenges) accessing web archives using software other than would've been available at the time content was archived.

logo of the Society of American Archivists

"What does it take to archive a linear foot of the Web?," Anna Perricci posed rhetorically to our web archiving metrics breakout discussion group two weeks ago. I don't yet have a good answer for what the question's getting at, but I was gratified by the level of interest and engagement in web archiving as archiving at the just-concluded Society of American Archivists (SAA) Annual Meeting and inaugurally coscheduled Archive-It Partner Meeting.

logo of the 2015 Joint Conference on Digital Libraries

We've written before on our restoration of the oldest U.S. website, covering in detail how we did it and some interesting discoveries we made along the way. More recently, Web Archiving Engineer Ahmed AlSum prepared a visual diagram (see below) of the steps involved in packaging, indexing, and making accessible the legacy web content in a poster for the Joint Conference on Digital Libraries (JCDL), an annual meeting sponsored by the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE) focused on research and development for digital libraries. Notably, the display won the Best Poster Award! We celebrate the continued community interest in Ahmed's innovative work.

winning poster for JCDL 2015, titled 'Reconstruction of U.S. First Website'

Pages