SDR Deposit of the Month: The Stanford Study of Writing

August 4, 2019
Hannah Frost

How do students develop as writers? How do we study the process of writing development? How can we apply such learnings to improve writing instruction? These are the primary questions driving the Stanford Study of Writing, a research project led by Andrea Lunsford, the Louise Hewlett Nixon Professor of English Emerita, in 2001-2006. The dataset resulting from this landmark study – over 13,000 writing samples – is now archived and accessible in the Stanford Digital Repository.

The study involved a random sample of students entering the Stanford class of 2005; the cohort of 189 students (roughly 12% of the class) agreed to submit all of the writing they produced for their classes and optionally any extracurricular writing. Annual surveys and elective interviews with the students were also conducted. The material was gathered and hosted in a database supported by the IT department of the Vice Provost of Undergraduate Education. 

Lunsford comments:

When I started doing research on undergraduate student writing in the late 1970s, archives were very far and few between: Harvard had a cache of 3,000 student papers put there to testify to the ‘illiteracy of American boys’, and by accident I stumbled on a batch of mid-19th century student writing at a library in Scotland, but systematic archiving of student work was not a priority…. The Stanford Study of Writing was an early attempt to capture as much of the in-class AND out-of-class writing (in any medium or genre or mode) of a representative sample of the Stanford students and to store it electronically.  From the beginning, one of our goals was to make as much of this material available to other scholars. We believe there is still much to be learned from studying this particular slice of undergraduate student writing.

Recognizing that this dataset had long-term value as a unique reflection of the Stanford student experience, University Archives had its sights on preserving the dataset in its holdings. Yet preserving a dataset of this kind was not something the SDR, then in early development, was capable of a decade ago, so as the research project ramped down, the dataset rested quietly in the VPUE-hosted system. 

Then one day in 2017, a Graduate School of Education doctoral student contacted the SDR team seeking guidance on locating and accessing the data. The inquiry kicked off an multi-pronged effort to extract the material from the original database, anonymize and clean the data, and organize it for sharing with others to use. One of the key people in this preparatory effort has been Noah Arthurs, currently a Master’s student in Computer Science. Once the data was ready, Arthurs applied computational methods, specifically natural language processing, to analyze the dataset; this work led to an article, Structural Features of Undergraduate Writing: A Computational Approach, published in the Journal of Writing Analytics in 2018. 

Jenn Fishman (English PhD, 2004) was involved in conducting the study while it was underway, and has also played a key role to enabling use of the archive through the SDR. Fishman says,

The Stanford Study of Writing was my entry point into undergraduate writing research.... By 2006, we had amassed a dataset containing more than 13,000 individual documents, which we hoped researchers might use to better understand college writing and improve writing education. It was not until 2009, when I was writing about research methods, that I realized we also had created an archive, specifically an historical archive that students, teachers, and researchers might study to better understand the past. Ten years later, thanks to the Stanford Digital Repository, this new collective work can get started. 

Lunsford and Fishman are in the early stages of planning an online exhibit about the SSW, featuring publications, related research and other contextual materials to enrich use of the dataset by researchers.