Improving functionality for Spotlight and SDR content for full text and beyond

October 3, 2018
Mark A. Matienzo

Stanford Libraries recently announced the launch of Virtual Tribunals, a collaborative project with the WSD HANDA Center for Human Rights and International Justice, intended to be a broad platform for access and research to records from international criminal tribunals. As noted in the launch announcement for Virtual Tribunals, this project has allowed Stanford Libraries to make additional incremental improvements to Spotlight at Stanford, and to the parts of our infrastructure that support access to our digital collections. Cathy Aster recently wrote about the support added to Spotlight for internationalization of exhibit page content and user interface labels. In addition, a number of other additions developed for Virtual Tribunals will be made more broadly available for Spotlight exhibit creators and for content managed in the Stanford Digital Repository.

Much of the work undertaken on the Virtual Tribunals project relates to supporting full-text search for items with text produced using optical character recognition (OCR). While descriptive metadata produced for the records made available on Virtual Tribunals serves as a strong basis for discovery, adding full text search was identified as a core deliverable for the grant project used to fund the first phase of its implementation. Staff in DLSS recognized that full-text search would provide transformative access to this collection, as well as unlock the potential to improve access to many other collections. Based on the software improvements we made to Spotlight at Stanford, we are introducing features that support full-text search across items in an exhibit and show matches in full text as part of the item display in search results. While supported previously in The Edward A. Feigenbaum Papers exhibit, the work completed on Virtual Tribunals will serve as a basis for further work to expand this functionality to further exhibits hosted on Spotlight at Stanford.

Screenshot of full-text search results with highlighted matches

 

In addition, we have introduced an implementation of the International Image Interoperability Framework’s Content Search API to allow “search within” functionality within the context of a specific object with page-level machine-readable text. This implementation allows searching for queries within a particular document in a standards-compliant way that will work across IIIF viewer software like Universal Viewer and Mirador. The viewer can then show the location of matching hits for those queries and allow you to navigate to where they are located. For example, this can be seen in this item from the Virtual Tribunals exhibit shown below.

Search results in the Universal Viewer

 

Based on a feature request for Virtual Tribunals, we also have added new functionality that allows an exhibit creator to add a search box to browse categories. This provides an easier pathway for users to limit their query based on a specific curated feature.

Browse category search box in the Mario Paci exhibit 

 

While these improvements are intended for broader use by Spotlight exhibit creators, Digital Library Systems and Services staff are investigating remaining barriers to enable full-text searching for resources in SDR, and ensuring that we understand needed functionality. Some of these barriers include the wide variety of OCR formats that Stanford has used, the quality of the source images and the level of accuracy of the OCR that exists, and the permissions of the digital objects. For example, items in the Joint Commission on Atomic Energy exhibit have text produced by OCR, but the SDR objects for these items require further remediation work to make the text searchable. Despite the additional forthcoming work necessary to extend this functionality to other collections, we are excited about the possibilities to expand searching of collections containing full text in Spotlight. If you have further questions about the status of this work, please contact Josh Schneider.

Author

Mark A. Matienzo

Collaboration & Interoperability Architect, Digital Library Systems and Services
accessibilityaccessprivsarrow-circle-rightaskus-chataskus-librarianbarsblogsclosecoffeecomputercomputersulcontactsconversationcopierelectricaloutleteventsexternal-linkfacebook-circlegroupstudyhoursindividualinterlibrarynewsnextoffcampusopenlateoutdoorpeoplepolicypreviousprinterprojectsquietreservesscannersearchstudysupportingtabletourstwitter-circleworking