An important stage in the data lifecycle is documenting your data and the approaches you used in creating it. Thorough documentation makes your data more accessible to others by providing information about your data and explaining how it was produced. An additional benefit of documentation is that it can act as a reminder of methodology for a project you are revisiting after a long period of time.


Metadata is best defined as data that provides information about or describes other data. Creating reliable metadata is one of the most important steps in enabling users to identify, locate, retrieve, and use your data. Examples of metadata could include title/filename, author, date of creation, descriptive tags, etc. These elements are used to classify, link, and index your data. Dataedo provides a useful brief summary of metadata examples.

There are many different metadata standards in use today (the Digital Curation Centre maintains a list of standards), and the one that makes the most sense for your project will likely depend on the type of data being produced. Often the funding agency you are working with will dictate which standard to use. The RDAlliance makes available a large (but not exhaustive) directory of discipline-specific metadata standards. Some of the most widely used standards are:

Data Dictionaries

A data dictionary acts as a descriptive companion to a dataset. In contrast to metadata, which uses controlled vocabulary to link your data, a data dictionary provides detailed explanations of how the data was created and how to interpret and use the data. As such, the dictionary is valuable not only to external users as a guide, but to members of the project team as a reminder of important steps taken and decisions made.

Making a data dictionary is an iterative process. The dictionary should be created simultaneously with the creation of your data and should be revised along the life of the project, as the data changes or as new components are added.

Data dictionaries are often stored in a file with a name that indicates it should be read alongside, or prior to, engaging with the data (e.g. _readme_.txt). Some best practices for creating a data dictionary can be found at