Best practices for file formats

The file formats you use have a direct impact on your ability to open those files at a later date and on the ability of other people to access those data.

For more information on managing data, visit our FAQ. Interested in learning more about data best practices? Check out our workshop.

Proprietary vs. open formats

You should save data in a non-proprietary (open) file format when possible. If conversion to an open data format will result in some data loss from your files, you might consider saving the data in both the proprietary format and an open format. Having at least some of the information available to you later will be better than having none of it available!

When it is necessary to save files in a proprietary format, consider including a readme.txt file in your directory that documents the name and version of the software used to generate the file, as well as the company who made the software. This could help you down the road if you need to figure out how to open these files again!

The Library of Congress has published a Recommended Formats Statement that discusses this topic in great depth.

Guidelines for choosing formats

When selecting file formats for archiving, the formats should ideally be:

  • Non-proprietary
  • Unencrypted
  • Uncompressed
  • In common usage by the research community
  • Adherent to an open, documented standard, such as described by the State of California (see AB 1668, 2007)
    • Interoperable among diverse platforms and applications
    • Fully published and available royalty-free
    • Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology
    • Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard.

Some preferred file formats

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MPEG, AVI, MXF
  • Sounds: WAVE, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Web archive: WARC

See the Library of Congress' Sustainability of Digital Formats web site for more complete listings and discussions of formats, including guidance for the preservation of data sets, geospatial data, and web archives. Or visit the LOC's page on Recommended Format Specifications for preservation.

File formats case study

File formats case study, image by Amy HodgeView the file formats case study for real-life examples of problems you could encounter if you don't make good file format choices!