SULAIR SULAIR HomeSU Home
Robotic Book Scanning at Stanford University
Detail of the quality control interface
 
 
OCR and derivative creation

OCR and derivative creation

Once the quality control operator confirms that images for a book meet the project’s standards, a set of images is sent to a cluster of workstations that convert page images to editable text using Optical Character Recognition (OCR) technology. The OCR process is entirely automated, and does not involve the use of human operators to correct errors in text conversion. The accuracy of OCR varies greatly with the quality of the original printed page and scanned image. After OCR is complete, derivative files such as image-only PDF, searchable PDF, JPG and ASCII text are created — again, via a completely automatic process.

Next »

©2004 The Board of Trustees of the Leland Stanford Junior University. All rights reserved.