You are here

Web

RSS

Archives

'The Dish (HDR)' (under CC BY-NC-SA 2.0)

A couple of weeks ago, Stanford University Libraries hosted Dame Wendy Hall, Jim Hendler, and other web scientists affiliated with the Web Science Trust for a briefing on the Web Observatory initiative and a follow-on workshop organized by Lisa Green from Common Crawl. The notion of a Web Observatory implies a center proferring scientific instruments, but for the analysis of web data rather than natural phenomena. Indeed, the group's vision is that Web Observatories provide access to web datasets, projects, and tools. Eventually, a network of Web Observatories might offer both an interoperable architecture and distributed infrastructures for sharing and analysis of web datasets. The initiative touches on several areas of interest and investment by Stanford University Libraries, including data curation, web archiving, and supporting social science research.

Social science research increasingly depends on computational methods and digital primary materials. As a case in point, the listserv of the Association of Internet Researchers (AoIR), an organization for social science research on networked communications, features regular discussions on web data collection and analysis. A perusal of those conversations underscores the dearth of reusable web datasets and the one-off nature of new datasets that are created. In the context of research data more broadly, it is for this and other reasons that research libraries increasingly offer data curation services. Persistent access to well-described data is only one part of the puzzle, though; as Victoria Stodden noted in the 2013 Forum on the Future of Scientific Publishing, the review, reproduction, and/or reinterpretation of computational analyses also demands the continued availability of the employed applications (PDF). The Web Observatory architecture natively recognizes this requirement.

The web archiving community meanwhile collectively hosts petabytes of historical web data and grapples with the specification of the fundamental set of services (PDF) to support common research use cases. Common Crawl itself provides access to hundreds of terabytes of web data through Amazon Web Services Public Data Sets platform. Working with (a manageable subset of) this corpus was the focus of the follow-on workshop. The research that the Common Crawl data is more broadly enabling (including by Stanford-affiliated researchers) is a useful demonstration of the interest in web datasets, the kinds of services that researchers may be interested in, and the potential of the Web Observatories initiative.

As we continue to develop our web archiving services, in particular, we will look for opportunities to align with and contribute to the Web Observatories framework.

'Step 7' (under CC BY-NC-ND 2.0)

A major challenge for web archivists is the low visibility that downstream archiving has on upstream web content creation. And, yet, deliberate and inadvertent architectural decisions made by web content creators strongly impact the ease or difficulty with which their websites can be captured and faithfully re-presented. A non-trivial byproduct of webmasters helping to ensure their content is archived for their own later use is that the Web itself becomes more archivable, to everyone's benefit.

The Stanford University Libaries is one of the founding partners of the International Image Interoperability Framework (http://iiif.io), which aims to enable broad access to cultural heritage images on the web. This exciting initiative is in its fifth year and is beginning to have an impact on the way digital images are used to support research and teaching.  The IIIF editors recently released version 2.0 the IIIF API's, which is a major step towards creating a stable and sustainable technology framework for image interoperability.   

To celebrate this progress, the IIIF community is hosting a one day information sharing event at the British Library about the use of images in and across cultural heritage institutions.  The day will focus on how museums, galleries, libraries and archives, or any online image service, can take advantage of a powerful technical framework for interoperability between image repositories.   This event will be valuable for organizational decision makers, repository and collection managers, software engineers, and anyone interested in exploring the wide range of use cases that are seamlessly enabled by the framework.  

Attendance is free, and widespread dissemination of the event is encouraged.

A detailed program is available at http://iiif.io/event/2014/london.html and those interested can register to attend at http://bit.ly/iiiflondon2014.

 

Tape container for Wind (1961)

The Archive of Recorded Sound is delighted to announce that the Richard Maxfield Collection (ARS.0074) can now be listened to online, via the collection's finding aid on the Online Archive of California. This collection features nine distinct works by electronic music composer Richard Maxfield, composed between 1959-1964, four of which are believed to be previously unpublished (Dromenom, Electronic Symphony, Suite from Peripateia, and Wind). Additionally, as Maxfield frequently produced unique edits of his work for each performance, many of the open tape reels that form this collection include alternative edits to those previously published, such as the tapes for Amazing Grace which feature three different versions of the work. 

Screen shot of Maps of Africa exhibit front page

The Stanford University Libraries (SUL) is pleased to announce the release of Spotlight, an innovative solution that enables libraries and other cultural heritage institutions to build online exhibits from content in their repositories to better highlight their digital collections.

Spotlight is a plugin for Blacklight, which is a popular open source solution for building library discovery environments.  Spotlight enhances Blacklight by providing a self-service forms-based user interface that allows exhibit-builders, such as librarians or faculty, to customize the search interface and homepage, and to build media-rich feature pages to better contextualize their collections. 

Stanford first announced the development of Spotlight in early February of 2014, following a months long process of design and community outreach to validate the need for such a solution in the digital library community and obtain feedback on our approach.  This was followed by a twelve-week cycle of software development that has culminated in the release of Spotlight version 0.1.0, available as open source software on Github.

This first release of Spotlight is best suited to featuring digitized still image collections.  The first production exhibit built with Spotlight was recently completed by SUL's Digital and Rare Maps Librarian, and features a spectacular set of digitized maps of Africa.  A brief video tour of this first online exhibit can be viewed on YouTube.


Spotlight enables an exhibit builder to heavily customize many elements of the user experience, and to build rich feature and about pages to give viewers a deeper understanding of the collection and its items.  This YouTube video gives a tour of Spotlight from the exhibit-builder's perspective, and demonstrates many of the available customization features.


The 0.1.0 release of Spotlight is only the beginning.  Our goal at Stanford is to work with library staff and content experts to build several more sites in the coming months as a way to user-test the software, identify bugs and enhancement opportunities, and most importantly to begin exposing more of Stanford Libraries' rich image resources.  We are also working with peer institutions to adopt and test this first version with the intention that Spotlight will grow as a community supported, open-source solution. We encourage you to download it, give it a try, and send us feedback.

And certainly the engineering work is far from complete.  There is a backlog of issues to address and several areas we have identified for future development:

  • Selection and indexing : the tools and workflow for adding new content to a Spotlight index and updating metadata as it changes in the repository. 
  • Support for more content types : Spotlight currently supports digital still image collections, and we hope to add support for audio, video, PDF, datasets, geospatial objects, web archives and more.  
  • Theming : the ability for builders to choose from multiple visual themes to apply to an exhibit or collection, and to add custom header images and branding. 
  • Repository integration : currently, a Spotlight exhibit can be built on top of any Solr index. Work has begun to more easily create new Spotlight indexes directly from digital repository systems, and to save exhibit-specific metadata and supporting content into repositories. OUr initial integration efforts are focussed on the Fedora repository system, but we hope integration with other platforms will follow.  

Spotlight is being built by an exceptionally talented group of engineers in the Digital Library Systems and Services division of SUL, with support from the software engineering firm Data Curation Experts (DCE).  The team includes Gary Geisler, Chris Beer, Jessie Keck, Jack Reed and Christopher Jesudurai (all from Stanford), and Justin Coyne from DCE.

Follow our progress, or better yet download and install the software at http://github.com/sul-dlss/spotlight.

Send us feedback at exhibits-feedback@lists.stanford.edu.

logo of the International Internet Preservation Consortium

Web archivists Ahmed AlSum and Nicholas Taylor and LOCKSS Chief Scientist David Rosenthal recently attended the International Internet Preservation Consortium (IIPC) General Assembly, an annual meeting of national libraries, research universities, non-profits, and service providers engaged in web archiving. This was the first General Assembly we all attended since Stanford University Libraries (SUL) joined the IIPC, though we had all previously attended meetings under the auspices of other organizations.

'Material' (under CC BY-NC 2.0)

Congressional campaign websites are valuable primary source material for historians, social scientists, and the public to better understand the evolution of political communication in the Web era. Campaign websites also afford unique opportunities for the mass collection of materials that would have been previously difficult to acquire outside of the candidate's district. While it is a truism that the Web is constantly changing and broken links are an inevitable outcome, campaign websites are predictably ephemeral given their time-limited purpose.

Australian soprano Marjorie Lawrence in an undated publicity photo

British Pathé just released an astounding 85,000 archival film clips on YouTube. Included are numerous clips of musical interest including great singers, instrumentalists, and conductors; music making in the home and community, musical oddities, and unique performances and venues. One clip that caught my attention today is of Australian soprano Marjorie Lawrence making her first standing appearance after being stricken with polio (she's performing with the Chicago Symphony Orchestra, 1947). Her story was memorably told in the Hollywood film based on her memoirs, Interrupted Melody, starring Eleanor Parker as Lawrence.

Pages