You are here

Digital Library Blog

RSS

Archives

The International Image Interoperability Framework (http://lib.stanford.edu/iiif) is an initiative driven by several major research and national libraries to enable the rich and robust delivery of digital images through common interfaces, and to spur the development of open source and commercial software solutions in this space.

The IIIF Working Group invites comment and feedback on a proposed API for the the delivery of images via a standard http request. The full specification can be found at:

http://library.stanford.edu/iiif/image-api

The IIIF Image API specifies a web service that returns an image in response to a standard http or https request. The URL can specify the region, size, rotation, quality characteristics and format of the requested image. A URL can also be constructed to request basic technical information about the image to support client applications.

In March, approximately 2,100 objects representing three collections were accessioned to the Stanford Digital Repository (SDR).

  • R. Stuart Hummel collection: ~ 2,100 items
  • The Life of Saint Catherine, Codex M0381: 1 manuscript
  • Special collection requests: 1 thesis

More details, including links to sample images are listed below.

While many of these objects are already discoverable via SearchWorks others will get SearchWorks records in the coming months. However, all materials are currently available via the item’s PURL (a persistent URL which ensure that these materials are available from a single URL over the long-term, regardless of changes in file location or application technology).

In February approximately 7,000 objects representing six collections were accessioned to the Stanford Digital Repository (SDR), bringing the total number of objects in SDR to nearly 250,000.

  1. Buckminster Fuller collection: 5,200 slides
  2. Kitai topographical maps: 1,600 maps
  3. McLaughlin Maps, California as an Island: 114 maps
  4. R. Stuart Hummel collection: 52 items
  5. Eliasaf Robinson collection addendum: 1 gazette
  6. Islamic prayer book, 1228 H: 1 manuscript

More details, including links to sample images are listed below.

Inclusion in the Stanford Digital Repository ensures that these materials are available to researchers and scholars (while upholding appropriate access restrictions), now and in the future through a secure, sustainable stewardship environment.

While many of these objects are already discoverable via SearchWorks others will get SearchWorks records in the coming months. However, all materials are currently available via the item’s PURL (a persistent URL which ensure that these materials are available from a single URL over the long-term, regardless of changes in file location or application technology).

The (meta)data underneath SearchWorks is largely based on our MARC records from Symphony. MARC records are exported from Symphony, then slurped up by an application called SolrMarc, which transforms the MARC data into an index for the Solr search engine used by SearchWorks.

SolrMarc is open source software made available by Bob Haschart of the University of Virginia Libraries. SolrMarc is used by all(?) VuFind sites as well as most Blacklight sites built on MARC data (e.g. SearchWorks). SolrMarc has been great for us -- it gave us an enormous jump start for SearchWorks. Bob is also a great guy, and made me a "committer" almost immediately -- so I can make contributions to the open source code.

But.

Open Source Software does best when there is a critical mass of developers: group wisdom rocks, as does sharing the work. To date, SolrMarc is very much Bob's project, despite a number of committers such as myself. There are some ... interesting ... practices as to how SolrMarc is organized and how it is tested. I've even contributed a bit to some of its squirreliness. Occasionally, changes to the SolrMarc codebase break the code I've written especially for Stanford.

The Digital Production Group is very excited about an upcoming project featuring the personal papers of "Laura Bassi, a noted 18th-century Italian scientist and Europe's first female professor, " with Project Manager Cathy Aster at the helm.

More information to come, but in the meantime take a look at this recent article in the Stanford University News.

We've been examining whether or not to restore stopwords to the SearchWorks index. Stopwords are words ignored by a search engine when matching queries to results. Any list of terms can be a stopword list; most often the stopwords comprise the most commonly occurring words in a language, occasionally limited to certain functions (articles, prepositions vs. verbs, nouns).

The original usage of stopwords in search engines was to improve index performance (query matching time and disk usage) without degrading result relevancy (and possibly improving it!). It is common practice for search engines to employ stopwords; in fact Solr (http://lucene.apache.org/solr), the search engine behind SearchWorks, has English stopwords turned on as the default setting.

In our implementation of SearchWorks, there was no compelling reason to change most of the default Solr settings; thus, since SearchWorks's inception we have been using the following stopword list: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, s, such, t, that, the, their, then, there, these, they, this, to, was, will, with.

What follows is an analysis of how stopwords are currently affecting SearchWorks, and what might happen if we restore stopwords to SearchWorks, making every word signficant for every search.

We're pleased to announce the release of Version 1.4 of Parker on the Web, the fourth incremental site release since the launch of Version 1.0 in Fall 2009.

Version 1.4 constitutes the most substantial corrective content release to date. More than 500 additional image reshoots are now integrated into the site, along with a host of sequencing corrections -- which have impacted 151 manuscripts (27 per cent of the online collection). The reshoots either replace existing images with better quality versions, or provide images for selected manuscript pages that had been previously overlooked for digitization. This brings the images to a state of 99 per cent or better accuracy across the total of over 200,000 images. Corrections to manuscript descriptions, summaries and bibliography are also incorporated in this release. Along with these content corrections, approximately 100 new bibliographic citations have been added to the site as well.

This release constituted a significant technical challenge requiring numerous QC passes and analysis of image rendering problems for the web application. JP2 derivatives were generated by DLSS of the Cambridge-produced TIFF reshoot masters, but JPEGs were consistently not being rendered from the JP2s due to an unsupported bit depth error. Pair the resolution of this problem along with the complex interleaving and replacement of selected images -- along with detailed sequence file corrections -- and you have a set of interlocking issues that required lots of time and attention to detail to resolve.

Kudos for a job well done to Chris Jesudurai, Doris Cheung and Tony Calavano, along with Suzanne Paul from Corpus Christi College. A great team effort!

I recently attended a workshop of the KEEP project (Keeping Emulation Environments Portable) in Rome. KEEP is an EU funded project to develop software that virtualizes old computer hardware and software environments. This allows you to run old operating systems and the applications that were designed for them on modern computers. The KEEP project is multi-partner project that than includes a consortium of national libries (BNF, Koninklijke Bibliotheek), the University of Portsmouth, a computer history museum (Computerspiele Museum), commercial partners (Tessella), and the European Game Developers Association.

The project is scheduled to end in February 2012 and has already released software version 1.0.0 on SourceForge ( http://emuframework.sourceforge.net/ ). This version supports:
* 5 platforms: x86, C64, Amiga, BBC Micro, Amstrad
* 6 emulators included: Dioscuri, Qemu, VICE, UAE, BeebEm, JavaCPC
* 22 file formats supported: PDF, TXT, XML, JPG, TIFF, PNG, BMP, Quark, ARJ, EXE, disk/tape images and more
* Integration with format identification FITS
* Web services for software and emulator archives

Pages