Looking ahead from the 2015 IIPC General Assembly

May 18, 2015
Nicholas Taylor
logo of the International Internet Preservation Consortium

A couple of weeks have passed since the successful conclusion of the annual IIPC General Assembly, hosted this year by Stanford University Libraries and Internet Archive. The meeting has been well summarized already in posts by Sawood Alam, Jefferson Bailey, Emmanuelle Bermes, Tom Cramer, Carlos Eduardo Entini, and Ian Milligan. Rather than contributing another retrospective, I'd like to instead look ahead to 2016 and consider what the web archiving community might accomplish together in the coming year, highlighting some of the opportunities discussed and presented two weeks ago.

Continue the "mainstreaming" of web archives as primary research materials

It was gratifying to see the breadth of both research disciplines and research support initiatives represented at the General Assembly. I hope to see brilliant new scholarship in the coming year from the maturing community of researchers working with web archives and believe that we're also well-positioned to make inroads with many who haven't worked with web archives before. Continued experimentation is needed not just in tools and interfaces but also in service and engagement models. We should mind, and then mine, local models of success for replicable access and research services.

Explore (and implement?) at least one core API

There's no shortage of potential APIs we could specify, standardize, and build to improve the interoperability and modularity of web archiving systems. Based on areas of recent community interest, conspicuous candidates might be interfaces between Heritrix, archiving proxies, and headless browsers; an export API for distributed preservation and for delivery of derivative datasets; and a framework for interoperable annotation of archived web content. Any of these would be good pretexts for fostering developer collaboration across institutions as well as exploring new working group models.

Standardize measurement of our web archives

Applying standardized measurements to our archives would help us to better understand their contents and communicate their value. With the recently-reported progress on web archive profiling, this may be the year for distributed testing, publication, and aggregation of actionable Memento profiles. Andy Jackson's compelling visualization of link rot and content drift in the UK Web Archive could be reproduced (via halflife) across our archives and used as a powerful advocacy tool, both for our local programs and for our collective endeavor.

Broaden the contributors to the Open Wayback project

The Open Wayback development effort will be most sustainable when its community of contributors is more commensurate with its broader community of stakeholders. While pull requests are always welcome, there are many more ways than code to contribute to the project, that draw upon the diverse skills of the web archiving community: join and participate in discussions on the openwayback-dev list, submit or comment on existing bug reports or feature enhancement requests, use and recommend improvements to the documentation, or test release candidates and report feedback.

Generalize work on full-text search

Along with data mining, it's telling that full-text search was of sufficient interest that the ostensibly Access Working Group session took place in plenary. For as broadly interested as we are in full-text search, it'd be great to see stronger collaboration across institutions. For my part as co-lead of the Access Working Group, and reflecting also the inclination of my fellow co-lead Daniel Gomes, I'd like to identify potential work related to full-text search of both broad utility and interest. An early candidate for this work could be to create a coded reference dataset that could be used to optimize retrieval relevance.

An incrementally (or radically?) more mission-supporting Consortium Agreement

The impending expiration of the latest three-year Consortium Agreement (PDF) is an opportunity to retool the organization to better achieve our goals. Minimally separating out from the Agreement how the IIPC operates and encoding that in by-laws would allow the organization to be more dynamic and adaptable. Based on some of the conversations in the breakout groups and within the Steering Committee, we might more ambitiously consider how to engage the participation of affiliated communities (in web archiving, if not the IIPC per se); the organizational changes necessary to enable solicitation of external funding; and novel project or working group models better adapted to getting stuff done.

Needless to say, this is an abbreviated list; I'm excited for what may also come out of community discussions during at least three upcoming meetings, the Collection Development Working Group's candidate collaborative collections, greater institutional adoption/adaptation of WebRecorder.io, the WARC standard revision (PPTX) process, and more. I look forward to working with the IIPC and larger web archiving community to capitalize on these and other manifest opportunities in the coming year!