Introduction to metadata standards
In order to submit your research to a data repository, you may be required to format your metadata using a metadata standard. Consult the repository you will be using to determine what their metadata requirements are.
Metadata structures are often referred to as "schema." The schema will have a defined set of characteristics for describing the data. The completed metadata are often reported in a machine-readable language such as XML.
|Contributor||An entity responsible for making contributions to the resource.|
|Coverage||The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.|
|Creator||An entity primarily responsible for making the resource.|
|Date||A point or period of time associated with an event in the life cycle of the resource.|
|Description||An account of the resource.|
|Format||The file format, physical medium, or dimensions of the resource.|
|Identifier||An unambiguous reference to the resource within a given context.|
|Language||A language of the resource.|
|Publisher||An entity responsible for making the resource available.|
|Relation||A related resource.|
|Rights||Information about rights held in and over the resource.|
|Source||A related resource from which the described resource is derived.|
|Subject||The topic of the resource.|
|Title||A name given to the resource|
|Type||The nature or genre of the resource|
If you are not using a standard metadata schema whose details are widely known and easily accessible to other researchers, be sure that you preserve the schema itself and its documentation, along with the data and metadata. By doing so, you will help ensure that you and others are able to fully understand and reuse your data in the future.
Examples of metadata standards
The following are several well-known and frequently-used metadata standards.
- Dublin Core: a general purpose metadata standard for describing networked resources
- Metadata Object Description Schema (MODS): a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. Metadata Encoding and Transmission Standard (METS) is a useful variation of MODS
- Federal Geographic Data Committee's standard (FGDC): an international ISO standard for describing geospatial data
- Encoded Archival Description (EAD): a standard for the encoding of finding aids for use in a networked environment
- Data Documentation Initiative standard (DDI): an international XML-based standard for the content, presentation, transport, and preservation of documentation (i.e., metadata) for data sets in the social and behavioral sciences
We can assist you in selecting a metadata standard that is appropriate for your field of research. See our consultations page for more information.
Ontologies are shared vocabularies that are used to describe components of a particular discipline and the relationships among these components. By using ontologies, you make it easier for others (or even the future you) to understand your data. Controlled vocabularies, on the other hand, are merely lists of predefined, authorized terms.
In addition to using a metadata standard, you may wish (or be required) to use ontologies or controlled vocabularies to create your metadata. For example, if you use the Dublin Core as your metadata schema, they recommended that you use the Internet Media List, a controlled vocabulary, to enter information in the "Format" label. It is also recommended that you use a controlled vocabulary to enter the subject terms, but it is up to you to choose which vocabulary to use.
Here are some examples of ontologies and controlled vocabularies currently in use in a variety of disciplines:
- Bioportal: the portal for the U.S. National Center for Biomedical Ontology, hosted at Stanford.
- Gene Ontology: a bioinformatics initiative that aims to standardize the representation of gene and gene product attributes across species and databases.
- Medical Subject Headings (MeSH): a controlled vocabulary used for indexing articles for PubMed.
- Web Ontology Language (OWL): an ontology used for the semantic web.
- Getty Thesaurus of Geographic Names (TGN): a controlled vocabulary that includes names and other information about places, administrative political entities, and physical features.
- RFC4646: this vocabulary provides a mechanism for describing the language of an object.
- Chemical Entities of Biological Interest (ChEBI): an ontology of small chemical compounds.
- Microarray Gene Expression Society Ontology (MGED): an ontology designed to describe microarray experiments.
- Internet Media List: a controlled vocabulary of internet media file types.
- Environmental Ontology (EnvO): an ontology used to describe environments for any organism or biological sample.
- Name Reaction, Chemical Methods, and Molecular Processes Ontologies: ontologies for chemistry from the Royal Society of Chemistry (RSC)
Advanced metadata case study
View the advanced metadata case study for a real-life example of metadata creation using metadata schema, controlled vocabularies, and ontologies.