5 Tips for writing a great data management plan

October 9, 2015
Amy E. Hodge
Tip jar

Check out these 5 tips from the trenches on writing a great data management plan (DMP) for your next research grant proposal. These tips are based on feedback given about real-life data mangement plans! Find out more about data management plans from our web site.

1. Consider all outputs of your research as "data."

You may not be thinking of "data" the same way that your funding agency does. Let's say that you're designing a specific type of algorithm. The primary output from your research will be a software package. So, while you may think you haven't generated any "data," your funding agency may think differently. You should account for anything your research produces, including data sets, software, visualizations, and physical objects (including collected samples and new strains of laboratory species), as "data" to be accounted for in your plan.

2. Be clear about the data you use. 

If you are using publicly-available data or data from some other source, carefully document the source of that data, including how and when you accessed it. Even better -- if copyrights and licenses allow -- provide direct access to the exact data set that you used. Providing a link to a public data set may be adequate, but can you be sure the data set will continue to be available at that site, or that it will not be replaced by a different version? It may be critical for the evaluation and/or reproduction of your work for others to have access to or knowledge of the exact data sets you used.

3. Provide end-to-end code for the generation of figures in publications.

This is the latest in research best practices. It is sometimes frustrating not to have the details on how figures are generated. It's even more frustrating to attempt to replicate these and not be able to do it. To avoid this problem, use software (R, MatLab, Python -- pick your favorite!) to write a script that analyzes your dataset and produces a figure or graph at the end. You can then include the figure directly in your publication, as well as the script used to generate it. The real beauty here, outside the ease with which you can regenerate the image if you find an error in your work, is the transparency it provides for how the analysis was done and the figure generated. 

4. GitHub is great, but be specific about your plans.

Git is a popular version control system, and GitHub is its web-based incarnataion. GitHub is an effective way to track and maintain files, particularly text-based files like software code. Using GitHub is great for data management, but if you talk about GitHub in your DMP, include the details. GitHub offers public and private repositories. Which will you use? Will you be making your software code available in a public repository as you work or releasing versions periodically to a public repository from your private one? Will you release the files publicly only once you've published? Maybe you'll use GitHub for a private repository and make the software available somewhere else when you publish. Don't make the reviewers guess!

>> Interested in learning more about Git? The Libraries' Data Management Services group hosts Software Carpentry workshops in conjuction with departments and research groups on campus . These two-day workshops include half a day of hands-on instruction about Git. If you are interested in coordinating with us to host a Software Carpentry workshop, please contact Amy Hodge (amyhodge@stanford.edu). 

>> The NSF Biology directorate has recently released updated guidance on data management plans. The list of resources in this guide includes the training offered by Software Carpentry!

5. Consider something robust for long-term discoverability, access, and preservation: The Stanford Digital Repository (SDR). 

You don't want to worry about where your old data files are or how to fulfill a request for data you haven't thought about in years. If you want to preserve your research outputs some place safe where anyone can not only discover that they exist, but also access and download the data files for themselves, consider the Stanford Digital Repository. Your data in the SDR will be assigned its own unique identifier and persistent URL (PURL) at which the data will always be available -- no 404 errors for you! And preservation and sharing of research via the SDR is a great way to meet your data management plan requirements! If you'd like to learn about using the SDR, please contact us at sdr-contact@lists.stanford.edu.