How to Manage Data Documentation

Here are some key steps to effectively document data.

Here are some key steps to effectively document data.

Poom Wettayakorn

Apr 15, 2024

Apr 15, 2024

data-management

👋 Hi there, welcome to Datascale: SQL+Data modeling. We help create database diagrams and map out data models from existing schema and SQL queries 🌱. Find us on LinkedIn 🎉

Data documentation involves providing detailed information about your data to ensure it can be understood, accessed, and reused properly. Here are some key steps to effectively document data:

1. Start Early

Begin documenting your data at the beginning of your project and continue adding information as you progress. It is easier to capture details as you go rather than trying to remember everything later on. [1]

2. Level of Documentation

  1. Project-level Documentation: Include information about the study's aims, research questions, data collection methods, instruments used, data processing, data access, and more.

  2. File-level Documentation: Describe the contents of folders or datasets, including data types, file formats, and relationships between files. A README.txt file is commonly used for this purpose.

  3. Variable-level Documentation: Provide definitions and explanations of variables, values, units of measurement, missing values, and any codes or abbreviations used.

3. What to Document:

Sampling details, field work dates, respondent tracking, issues encountered during data collection, data cleaning procedures, variables construction, and dataset creation information are essential elements to document. [3]

What's important to document?

  1. Context of data collection

  2. Data collection methodology

  3. Structure and organization of data files

  4. Data validation and quality assurance

  5. Data manipulations through data analysis from raw data

  6. Data confidentiality, access and use conditions

Data-level documentation

  1. Variable names and descriptions

  2. Definition of codes and classification schemes

  3. Codes of, and reasons for, missing values

  4. Definitions of specialty terminology and acronyms

  5. Algorithms used to transform data

  6. File format and software used

4. Documentation Formats

  • README File: Contains critical information about data files, including citation details, variable definitions, methodological information, and more. It is typically in .txt format.

  • Data Dictionary: Describes the names, definitions, and attributes of data elements, commonly used for tabular data.

  • Commented Code: In-line comments in computer code that provide descriptions of the code's function not evident from the code itself.

  • Data documentation software: Datascale's data catalog & discovery tools

Modern Data Dictionary

References

  1. https://www.imperial.ac.uk/…/organising-and-describing-data/documenting-data

  2. https://www.reddit.com/…/how_does_your_company_handle_data_documentation

  3. https://data.library.arizona.edu/…/best-practices/data-documentation-readme-metadata

  4. https://guides.library.illinois.edu/introdata/documentation

Good reads


Related blogs

Contents

Section

Get a clear view of your SQL dependencies

Datascale helps reverse engineer data models from existing schema and SQL queries

Learn more

Where data gets modeled

Where data gets modeled

Where data gets modeled