Documenting your data means providing information that allows other users to understand and use your data. It is a requirement for open data, as shown in the FAIR principles. Data documentation can take different forms, from a simple text document (often called a README file) to information embedded within the files themselves, or even structured descriptive lists such as a catalogue.
What is a README File?
A README file is usually a text file titled README.txt that should be located at the root of your dataset. Its title indicates that any potential user of your data should consult it before checking any other part of your dataset.
The main README file explains the contents and structure of your dataset, and gives enough information for a potential user to determine whether the data is of interest to them or not. If your dataset requires a codebook, it can be included within it. You can of course also create secondary README files in subfolders to document specific parts of your data.
What’s in a Data Dictionary?
A data dictionary is used to catalog and communicate the structure and content of data and provides meaningful descriptions for individually named data objects.
Data dictionary contents can vary but typically include some or all of the following:
- A listing of data objects (names and definitions).
- Detailed properties of data elements (data type, size, nullability, optionality, indexes).
- Entity-relationship.
- Reference data (classification and descriptive domains).
- Missing data and quality-indicator codes
How Data Dictionaries are used?
Documentation: provide data structure details for users, developers, and other stakeholders
Communication: equip users with a common vocabulary and definitions for shared data, data standards, data flow and exchange, and help developers gage impacts of schema changes
Application Design: help application developers create forms and reports with proper data types and controls, and ensure that navigation is consistent with data relationships
Systems Analysis: enable analysts to understand overall system design and data flow, and to find where data interact with various processes or components
Data Integration: clear definitions of data elements provide the contextual understanding needed when deciding how to map one data system to another, or whether to subset, merge, stack, or transform data for a specific use.
Sources:
Describe (Metadata/Documentation) (U.S. Geological Survey): https://www.usgs.gov/data-management/describe-metadatadocumentation
README File (Geneva Graduate Institute): https://libguides.graduateinstitute.ch/rdm/readme
Last updated: 21/02/2024