You are here

Digital Library Blog

RSS

Archives

Five new digital collections are now available in SearchWorks. These new collections take advantage of SearchWorks' ability to provide users with rich discovery and access capabilities for finding and working with digital collection content.

Marge Frantz lectures on McCarthyism, 2003 

The materials consist of videorecordings of lectures on McCarthyism by Marge Frantz. Lectures were part of an anthropology class taught by Dr. S. Lochlann Jain.

Collection Contact: Daniel Hartwig

Professor John Willinsky

John Willinsky waited for a couple of weeks after the fall quarter had started to give the Graduate School of Education (GSE) faculty and students some time to settle in to their routines before sending out the big news:

'The Dish (HDR)' (under CC BY-NC-SA 2.0)

A couple of weeks ago, Stanford University Libraries hosted Dame Wendy Hall, Jim Hendler, and other web scientists affiliated with the Web Science Trust for a briefing on the Web Observatory initiative and a follow-on workshop organized by Lisa Green from Common Crawl. The notion of a Web Observatory implies a center proferring scientific instruments, but for the analysis of web data rather than natural phenomena. Indeed, the group's vision is that Web Observatories provide access to web datasets, projects, and tools. Eventually, a network of Web Observatories might offer both an interoperable architecture and distributed infrastructures for sharing and analysis of web datasets. The initiative touches on several areas of interest and investment by Stanford University Libraries, including data curation, web archiving, and supporting social science research.

Social science research increasingly depends on computational methods and digital primary materials. As a case in point, the listserv of the Association of Internet Researchers (AoIR), an organization for social science research on networked communications, features regular discussions on web data collection and analysis. A perusal of those conversations underscores the dearth of reusable web datasets and the one-off nature of new datasets that are created. In the context of research data more broadly, it is for this and other reasons that research libraries increasingly offer data curation services. Persistent access to well-described data is only one part of the puzzle, though; as Victoria Stodden noted in the 2013 Forum on the Future of Scientific Publishing, the review, reproduction, and/or reinterpretation of computational analyses also demands the continued availability of the employed applications (PDF). The Web Observatory architecture natively recognizes this requirement.

The web archiving community meanwhile collectively hosts petabytes of historical web data and grapples with the specification of the fundamental set of services (PDF) to support common research use cases. Common Crawl itself provides access to hundreds of terabytes of web data through Amazon Web Services Public Data Sets platform. Working with (a manageable subset of) this corpus was the focus of the follow-on workshop. The research that the Common Crawl data is more broadly enabling (including by Stanford-affiliated researchers) is a useful demonstration of the interest in web datasets, the kinds of services that researchers may be interested in, and the potential of the Web Observatories initiative.

As we continue to develop our web archiving services, in particular, we will look for opportunities to align with and contribute to the Web Observatories framework.

'Step 7' (under CC BY-NC-ND 2.0)

A major challenge for web archivists is the low visibility that downstream archiving has on upstream web content creation. And, yet, deliberate and inadvertent architectural decisions made by web content creators strongly impact the ease or difficulty with which their websites can be captured and faithfully re-presented. A non-trivial byproduct of webmasters helping to ensure their content is archived for their own later use is that the Web itself becomes more archivable, to everyone's benefit.

San Francisco Ferry Building and streetcar: one of thousands of images used by the Image, Video, and Multimedia Systems research team to test image search algorithms

When you think about scientific data, you might think primarily about numbers and graphs and charts. But some data sets consist of rich image collections, including these data sets that have been preserved in the Stanford Digital Repository!

 

The Stanford University Libaries is one of the founding partners of the International Image Interoperability Framework (http://iiif.io), which aims to enable broad access to cultural heritage images on the web. This exciting initiative is in its fifth year and is beginning to have an impact on the way digital images are used to support research and teaching.  The IIIF editors recently released version 2.0 the IIIF API's, which is a major step towards creating a stable and sustainable technology framework for image interoperability.   

To celebrate this progress, the IIIF community is hosting a one day information sharing event at the British Library about the use of images in and across cultural heritage institutions.  The day will focus on how museums, galleries, libraries and archives, or any online image service, can take advantage of a powerful technical framework for interoperability between image repositories.   This event will be valuable for organizational decision makers, repository and collection managers, software engineers, and anyone interested in exploring the wide range of use cases that are seamlessly enabled by the framework.  

Attendance is free, and widespread dissemination of the event is encouraged.

A detailed program is available at http://iiif.io/event/2014/london.html and those interested can register to attend at http://bit.ly/iiiflondon2014.

 

BitCurator workshop

Porter Olsen from the Maryland Institute for Technology in the Humanities (MITH) hosted a full-day webinar at Stanford University on Friday, August 29, 2014 to introduce archivists from Stanford University Libraries, the Hoover Institute, and UC Berkeley’s Bancroft Library to BitCurator, an open-source all-in-one suite of digital forensics tools.

Dr. Rob Sanderson

In a move that will have a profound and long-lasting impact on the library sector, the W3C officially chartered a new working group on Web Annotation on August 20, 2014. Stanford Libraries staff member, Rob Sanderson, will serve as the working group's inaugural co-chair. 

The W3C is the standards body that guides the development of the Web, and has had a longstanding Open Annotation Community Group focused on how to annotate digital resources on the Web. As a newly chartered working group, the output of these discussions can now be channeled into official W3C recommendations, and baked into fabric of the Web itself.  

As library content and services become increasingly digital, the ability to annotate it--provide commentary, analysis, reviews, transcription, description, links and more--is increasingly a concern. By helping define a standard approach to annotation (in the broadest sense) of web resources, libraries can help fulfill their traditional mission of supporting research, scholarly communication and the diffusion of knowledge in the 21st century. And by working deeply in standards efforts like those of the W3C, libraries can help ensure their technologies and services are integral to and leverage the latest information technologies, instead of competing with them or lagging behind. 

Dr. Sanderson, who joined Stanford Libraries in April of 2014, brings extensive experience in annotations to the W3C and Stanford. He was one of the principal investigators of the Open Annotation Collaboration, a precursor to the W3C community group, where he also served as co-chair and a driving force. In recognition of his ongoing contributions and position within the community, Dr. Sanderson is serving as one of the co-chairs of the Working Group, which is a boon for the W3C, for Stanford, and for the future of annotation on the Web. 

Pages