Finding and Using Text Data
This workshop will present a picture of the vast landscape of full-text data collections available to the Stanford research community, including both licensed and publicly accessible text corpora. We will also discuss the varieties of textual data, on the spectrum that ranges from dirty OCR, to higher-quality but unstructured texts, and on to highly curated texts that are richly described according to standards like those of the Text Encoding Initiative (TEI). What do these various formats enable, and what do they impede?
This workshop is offered by Stanford Libraries' Center for Interdisciplinary Research as part of its mission to provide training in technical academic research practices and applied research methods.