Case study: File formats

These are examples from a collection of digital research data collected by Science Data Librarian Amy Hodge from 1997-1999 for her dissertation research. They illustrate some of the problems that you might experience if you do not choose appropriate file formats for your data. See best practices for file formats for more details.

Example 1

The files in the screen shot below were saved in proprietary formats (.adt and .gel) produced by a piece of equipment called a phosphorimager. Amy no longer has access to the software, and does not remember its name. An internet search did not provide any useful information about software that might be able to open these files. To avoid an issue like this, you should elect, when possible, to save or export files into an open file format to better ensure that future access is maintained.

Mac finder window showing files with unrecognized file types, screen shot by Amy Hodge

In the instance shown below, the phosphorimager data was saved in the proprietary format, but was also exported as a .tif file. TIF is an open format that can be read and understood by a wide variety of software even 15 years later.

Mac finder window showing files with unrecognized file types along with exported TIF version, screen shot by Amy Hodge

Example 2

In this example, the file named GROWTH CURVE EQUATION was saved as a .eqn file. Amy's computer could not automatically open this file, but it turned out to be readable in a text editor. This file should have been saved with a .txt extension (in addition to the .eqn file, if the software she was using required that extension). The file contains a script for calculating best fit curves for the growth curve analysis.

Mac finder window showing spreadsheet and equation files with unrecognized file types, screen shot by Amy Hodge

The growth curve data shown was saved in a proprietary spreadsheet format called .pgw that Amy can no longer open. The software used to generate these files is not known. A search of the internet indicated that .pgw is a map file, which is incorrect. These data tables should have been exported as .csv or some other open format to preserve their future readability.