Case study: Spreadsheets

The first two images below are examples from a collection of digital research data collected by Science Data Librarian Amy Hodge from 1997-1999 for her dissertation research. The third image shows calculations for determining the costs involved in making a quilt by hand. They illustrate a number of poor practices in spreadsheet usage.

Interested in more information about data best practices? Check out our workshop!

Example 1

At least five best practices are ignored in this first spreadsheet. See if you can spot the problems before you scroll down to reveal the answers!

Excel spreadsheet window with examples of poor practices, screen shot by Amy Hodge

Best practice problems

  • Headers should be in a single row. This spreadsheet contains headers in row 1 and row 2, as well as redundant headers for two columns in rows 14 and 26 that are confusing. There should be a single unique header for each column, preferably in row 1. In addition, many of the columns in this spreadsheet do not have headers at all!
  • Avoid the use of special characters. Special characters are often not exported correctly or not read correctly by other software programs. In this case, it would be easy to remove the percent signs from all of the cells in column H and then indicate in the header that the values in this column are percentages.
  • Do not leave empty rows or columns. These may cause problems when data are exported. Empty rows and columns also tend to indicate the presence of multiple tables in one sheet, which appears to be the case with this example. A sheet should contain only one table of data.
  • Do not leave empty cells. Empty cells sometimes cause problems for other software or when exporting data. Empty cells are also confusing, because it is unclear why the cell is empty. Was this not measured? Did the value seem unreliable so was omitted? Was this value deleted by accident? If a cell must be left empty, make a notation in a column of comments about why the cell is empty.
  • Do not embed charts, graphs, or images. They are not included when data are exported. Charts and graphs should be included in a separate sheet if necessary. These may also be exported as images from Excel. Also note that the chart in this spreadsheet has no labels of any kind!

Example 2

At least three best practices are ignored in this second spreadsheet. See if you can spot the problems before you scroll down to reveal the answers!

Another Excel spreadsheet window with examples of poor practices, screen shot by Amy Hodge

Best practice problems

Do not use colored text or cell shading. This formatting is lost when the data are exported or the file is opened in another program. Consider adding another column where the information indicated by the coloring and shading can be included in text form.

Do not use commas. When data are exported as a comma separated value (.csv) file, the commas within your data cells can cause confusion. They also sometimes indicate that multiple pieces of data (like city, state) are included in a single column. Separate data into multiple fields if appropriate or use another kind of separator (colon, semicolon -- but not a symbol!) if one is necessary. 

Do not merge cells. Cell merging is typically lost when data are exported and is often misread by other software programs. This may result in the shifting of data cells in the affected rows or columns. In the case shown here, the cells in column 1 should be unmerged and the date information entered into the first cell in each row.

Example 3

At least two best practices are ignored in this third spreadsheet. See if you can spot the problems before you scroll down to reveal the answers!

Apple Numbers spreadsheet window with examples of poor practices, screen shot by Amy Hodge

Best practice problems

Do not mix data types in a single column. In this column the first 10 rows are length measurements, then there is one row with the value "2 bobbins," followed by seven rows that are amounts of time. This is very confusing. Create new columns for different types of data, and avoid mixing text and numbers in the same column if possible.

Do not embed comments. These are lost when the file is exported. Consider creating a new column where comments can be captured.