Managing data FAQ

FAQ stencil, photo by Flickr user a_kep

It is getting really hard for me to dig through my own data and find things. Can you help?

Keeping all of your research data organized is sometimes a real challenge. We can help you put together a system that will make finding things in the future easier, whether your data are in a paper notebook, on a DVD, or stored on the campus network (or all of the above!).

I'm concerned that as people come and go from my research group, data will be lost. How can we make sure that this doesn't happen (again)?

It is the nature of an academic laboratory that people come and go, and when each individual is responsible for his or her own data, keeping it all straight when people leave can be tricky. How you manage this depends on your individual lab and the kinds of data that you collect, but we are happy to help you put together a system that is sustainable and practical.

I usually store my data in Excel spreadsheets. Is that adequate?

It depends.

Data analysis: While you are actively analyzing your data, Excel may be the most convenient place for it. However, Excel file formats change periodically, so you may not even be able to open your own data files several years down the road if you only save them in Excel format.

Spreadsheet vs. database: If you have a lot of different fields in your data sets and you want to be able to search on intricate combinations of those fields, a spreadsheet is probably not the answer for you. You might consider using a simple database, which incorporates much more powerful searching. We can help you decide if this is what you need, and we can probably help you get a simple MySQL database up and running!

Long-term storage and preservation: Because proprietary formats like those used by Excel change over time and are not readable by all possible users, it is best to use something like a comma-separated value (.csv) format for data sharing and long-term preservation. See best practices for file formats for more information about appropriate formats for other types of data, like text, images, sounds, video, and databases. You might also want to check out our case study on spreadsheets!

I need to do a better job at backing up my data while my research is still in progress. Do you provide backup services?

We do not provide backup services, but ITS does, as may your department or research institute. See our table of data storage and backup options (coming soon!) for more information about some of the choices available to you.

Do you provide working space or software for data analysis?

We do not provide working space or software for data analysis. For working space, you may need to contact your department IT or campus ITS. If you need help with quantitative, qualitative, or data conversion software, you may want to consider consulting with the Social Science Resource Center.

What are ontologies and metadata? Do I need to use them?

Metadata is information that describes your data. You have probably been collecting metadata for years in your lab notebook -- who did the experiment, when it was done, what you were hoping to learn, what samples were used, what the experimental conditions were, why you tossed out certain data points, what 1-10 corresponded to, etc.

If you are sharing your data with others, you also need to include information about who has permission to use it and what they have permission to do with your data.

Metadata is important for anyone (including yourself!) who is viewing your data files in the future. Imagine a file with 20 columns and 5000 rows of numbers with nothing but cryptic abbreviations for headers. Or a folder full of images labeled "IMG_5473" etc. Data like that are likely to be of no use to anyone.

Controlled vocabularies are lists of terms with their definitions that are used for a specific purpose, such as to describe a part of an anatomy or an experimental protocol. In an ontology, terms are typically organized into hierarchical structures that also express properties of the terms and relationships between the terms, such as whether term A is a subset of term B. Using controlled vocabularies and ontologies makes it clear what you mean (because agreed definitions are part of the terms) and makes it easier to search across many data sets for information of a particular type or to perform automated processing on massive sets of data.

Do you have people who can assist me in the preparation of metadata?

Our metadata analyst, Kim Durante, can help you with questions about creating and organizing metadata for your research data -- what it is, what to use, and how to use it effectively and efficiently. Contact us at ask-data-services@lists.stanford.edu, or contact Kim Durante directly at kdurante@stanford.edu for your metadata questions.

Do you work with graduate students? How about undergraduates? I would like to include better data management skills in the undergraduate/graduate curriculum I teach. Can you help me with that?

We love students! We believe that it is never too soon to get in the habits of good data management. If you are interested in consulting with us about how to incorporate better data management practices into an undergraduate or graduate curriculum, or would like to have us collaborate with you to teach classes that drive home the value of better data management, we would love to talk with you. Please contact us!