Finding and Using Text Data


February 21, 2018
2:00pm to 4:00pm
This workshop will present a picture of the vast landscape of full-text data collections available to the Stanford research community, including both licensed and publicly accessible text corpora.  We will also discuss the varieties of textual data, on the spectrum that ranges from dirty OCR, to higher-quality but unstructured texts, and on to highly curated texts that are richly described according to standards like those of the Text Encoding Initiative (TEI).  What do these various formats enable, and what do they impede?
