Digital Library Blog

Stopwords in SearchWorks - to be or not to be

December 16, 2011

We've been examining whether or not to restore stopwords to the SearchWorks index. Stopwords are words ignored by a search engine when matching queries to results. Any list of terms can be a stopword list; most often the stopwords comprise the most commonly occurring words in a language, occasionally limited to certain functions (articles, prepositions vs. verbs, nouns).

The original usage of stopwords in search engines was to improve index performance (query matching time and disk usage) without degrading result relevancy (and possibly improving it!). It is common practice for search engines to employ stopwords; in fact Solr (http://lucene.apache.org/solr), the search engine behind SearchWorks, has English stopwords turned on as the default setting.

In our implementation of SearchWorks, there was no compelling reason to change most of the default Solr settings; thus, since SearchWorks's inception we have been using the following stopword list: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, s, such, t, that, the, their, then, there, these, they, this, to, was, will, with.

What follows is an analysis of how stopwords are currently affecting SearchWorks, and what might happen if we restore stopwords to SearchWorks, making every word signficant for every search.

KEEP - Keeping Emulation Environments Portable

December 12, 2011
by Michael G Olson

I recently attended a workshop of the KEEP project (Keeping Emulation Environments Portable) in Rome. KEEP is an EU funded project to develop software that virtualizes old computer hardware and software environments. This allows you to run old operating systems and the applications that were designed for them on modern computers. The KEEP project is multi-partner project that than includes a consortium of national libries (BNF, Koninklijke Bibliotheek), the University of Portsmouth, a computer history museum (Computerspiele Museum), commercial partners (Tessella), and the European Game Developers Association.

The project is scheduled to end in February 2012 and has already released software version 1.0.0 on SourceForge ( http://emuframework.sourceforge.net/ ). This version supports:
* 5 platforms: x86, C64, Amiga, BBC Micro, Amstrad
* 6 emulators included: Dioscuri, Qemu, VICE, UAE, BeebEm, JavaCPC
* 22 file formats supported: PDF, TXT, XML, JPG, TIFF, PNG, BMP, Quark, ARJ, EXE, disk/tape images and more
* Integration with format identification FITS
* Web services for software and emulator archives

Tell us what you think!

December 7, 2011
by Ray Heigemeir

Hello all!

This is just a friendly reminder that the new SULAIR website preview is available for your viewing pleasure! We have begun receiving valuable comments which will inform our continued building-out of the site. We encourage the Stanford community to continue sending in requests, comments, complaints, questions, and praise. The feedback link may also be accessed on the preview site. Happy exploring!

Website Preview and Testing at the Library Open House

November 9, 2011
by Stuart Snydman

The new library website had a table at the Library Open House, at which we did some light-weight testing and previewed a live test site to visitors. It was a big success in that the table received approximately 50 visitors, 21 of which participated in the live test. The breakdown of testers included:

  • 10 undergraduates
  • 4 graduate/professional school
  • 3 lecturers/instructors/visiting scholars
  • 2 library staff
  • 2 other SU staff

Testers were directed to a laptop and asked to perform 3 to 5 common website tasks. Charles Kerns introduced the tasks and recorded whether or not it was successfully completed. All tests were recorded using Camtasia so we can replay them as videos and analyse how testers navigated the site to accomplish common tasks. The following is a list of some of the tasks we tested:

Currently in the labs - Materials from the Monuments of Printing Exhibition, Part 1

October 21, 2011
by Astrid Johannah Smith

Re-Posted from the Special Collections and Archives Exhibits Program listing -

The Monuments of Printing Exhibition highlights first 250 years of printing in the West 

Johannes Gutenberg's printing of a Bible from movable type in Mainz, Germany in 1455 marked the beginning of a communication revolution in the West. Printers were able to reproduce texts efficiently in quantities virtually unimaginable to a scribe. Monuments of Printing: from Gutenberg through the Renaissance, the first of two exhibitions spanning five-hundred years of printing history, demonstrates the development of typography and printing in Europe over a 250-year period as seen in selected works in the rare book collections of the Stanford University Libraries. The exhibition will open Monday, August 1, in the Peterson Gallery and Munger Rotunda on the second floor of the Bing Wing of Green Library, Stanford University, and is free and open to the public.

New Visual Design - Hours & Locations

October 7, 2011
by Sarah E Lester

We are excited to share with you a preview of another section of the new library website. We are especially proud of the new look for Hours & Locations, which makes this critical information much more accessible to patrons. Moreover, this redesign leverages Drupal's content management function to provide library staff with a much simpler, more streamlined back-end process for gathering and displaying hours and location information.

Library Website Development is Right on Schedule!

September 6, 2011

The work is divided into month-long "sprints". Sprints are intense work cycles in the Agile software development methodology. During these cycles, stakeholders and developers agree on priority tasks and functionality for each sprint.

Sprint 1 includes:
* getting the development website up and running
* implementing the website "theme"
* creating home, about, project, ask us, and search pages

Pages