Case study: Advanced metadata

Domain-specific repositories, such as the Protein Data Bank (PDB), often require the submission of highly structured metadata along with data files. This is what enables users to perform specialized searches within these data repositories. For example, in PDB you can search for all the ligases from mice that were determined by X-ray crystallography at a resolution of 2.5 Angstroms or better (there were 12 last time I checked!). If everyone submitted data in whatever format they wanted, this kind of searching would not be possible.

Protein Data Bank screen shot by Amy Hodge

The image below shows part of the metadata file for the crystal structure shown above. The complete file contains about 20,000 lines, many of which contain structure information generated during the experimental data capture. You can see that the metadata file includes specific categories that are filled in with specific data in defined formats.

Protein Data Bank metadata screen shot by Amy Hodge

If you are interested in submitting data to a domain-specific repository, data services staff would be happy to advise you about appropriate repositories for your data and the proper formatting of your submissions. Check out the information on our Consulting page for details or contact us or visit our page on advanced metadata.