How to Manage Data Documentation
Poom Wettayakorn
data-management
Data documentation involves providing detailed information about your data to ensure it can be understood, accessed, and reused properly. Here are some key steps to effectively document data:
1. Start Early
Begin documenting your data at the beginning of your project and continue adding information as you progress. It is easier to capture details as you go rather than trying to remember everything later on. [1]
2. Level of Documentation
Project-level Documentation: Include information about the study's aims, research questions, data collection methods, instruments used, data processing, data access, and more.
File-level Documentation: Describe the contents of folders or datasets, including data types, file formats, and relationships between files. A README.txt file is commonly used for this purpose.
Variable-level Documentation: Provide definitions and explanations of variables, values, units of measurement, missing values, and any codes or abbreviations used.
3. What to Document:
Sampling details, field work dates, respondent tracking, issues encountered during data collection, data cleaning procedures, variables construction, and dataset creation information are essential elements to document. [3]
What's important to document?
Context of data collection
Data collection methodology
Structure and organization of data files
Data validation and quality assurance
Data manipulations through data analysis from raw data
Data confidentiality, access and use conditions
Data-level documentation
Variable names and descriptions
Definition of codes and classification schemes
Codes of, and reasons for, missing values
Definitions of specialty terminology and acronyms
Algorithms used to transform data
File format and software used
4. Documentation Formats
README File: Contains critical information about data files, including citation details, variable definitions, methodological information, and more. It is typically in .txt format.
Data Dictionary: Describes the names, definitions, and attributes of data elements, commonly used for tabular data.
Commented Code: In-line comments in computer code that provide descriptions of the code's function not evident from the code itself.
Data documentation software: Datascale's data catalog & discovery tools
References
https://www.imperial.ac.uk/…/organising-and-describing-data/documenting-data
https://www.reddit.com/…/how_does_your_company_handle_data_documentation
https://data.library.arizona.edu/…/best-practices/data-documentation-readme-metadata