Breaking news! New York Times TDM Archive open for research.

December 6, 2021
Regina Lee Roberts
New York Time Masthead

The New York Times TDM Archive (1980-2020) is now available to Stanford University researchers for text data mining (TDM) research projects. Researchers can now access article text and metadata, encoded as XML objects, of the New York Times content covering 1980-2020. 

This 40-Year textual digital archive consists of approximately 3 million articles published by The New York Times, including but not limited to news, lifestyle, opinion and The New York Times Magazine. However, the collection excludes reader comments, paid obituaries and the kids section. Online news apps that display dynamic data for stories is also excluded. 

For access:

  • Stanford researchers (with SUNet IDs) must agree to the terms of a Data Use Agreement (DUA), which is available as a link in the Searchworks record.
  • The research must be for non-commercial and academic purposes.
  • Instructions for using the XML files are included in the data documentation.
  • NOTE:This is not access to the NYT online daily news. Please see our Newspaper and News Sources guide for information on how to search for select NYT news articles in the indexes. 

This New York Times TDM Archive builds on Stanford Libraries' efforts to provide researchers with newspaper corpora for TDM research projects.

Stanford Libraries has also negotiatied access to The Washington Post Archival Data (1977-Present), encoded as JSON objects. This collection is updated quartely.

Additionaly, the Stanford Community now has access to Proquest historical newspaper corpora via Proquest TDM Studio. Please visit the Proquest TDM Studio library guide page for details.

Interested in finding other newspaper titles, or indexes ?

We have a library guide for that. The Newspaper and News Sources guide has been created to help our community to navigate the amazing collections of print, microfilm, aggregated news indexes, and digital news sources from around the globe within our collections. 

As a reminder:

The Stanford Libraries has also negotiated access to the Washington Post online daily news for authenticated Stanford community members. For more information about WaPo online news access, please visit the blog post from January of 2020.

If you have additional questions, please contact: Regina Roberts.

Written by: Kate Barron (Research Data Curator) and Regina Roberts (Librarian for Communication and Journalism). 




Regina Lee Roberts

Regina Lee Roberts
Head of Social Sciences Resource Group
Bibliographer for Anthropology & Archaeology
Communication & Journalism
Feminist Studies
and Lusophone Africa Collections
Covering for Sociology Librarian (Interim)