BigQuery Information Schema: Primary Key and Foreign Key

Part 1: how to query DDL, metadata, and constraints from BigQuery's information schema to create an ER diagram.

Part 1: how to query DDL, metadata, and constraints from BigQuery's information schema to create an ER diagram.

Norapath Arjanurak

Jun 19, 2024

Jun 19, 2024

data-modeling

👋 Hi there, welcome to Datascale: SQL+Data modeling. We help create database diagrams and map out data models from existing schema and SQL queries 🌱. Find us on LinkedIn 🎉

BigQuery Information Schema

In database system, it is essential to understand the structure and relationships within the dataset. This can be enhanced by creating the lineage or entity-relationship diagram.

This blog series will go through the process of creating the ER diagram from a BigQuery dataset. In Part 1, we will focus on extracting Data Definition Language (DDL), metadata, and constraints using BigQuery's Information Schema.

How to create an ER diagram from a BigQuery dataset?

When working with a dataset, writing the queries can be challenging due to the difficulty of seeing a clear picture of the table structures and relationships. Also, manually exploring metadata can be time-consuming and cause errors. Indeed, the lack of built-in ER diagram generation support in BigQuery makes it hard to visualize the data schema.

To overcome this challenge, we will use BigQuery Information Schema to extract the necessary metadata (table and column names & data types) and constraints (PK/FK). This metadata will be used to create an ERD for visualizing the relationship between entities. The following section will provide the steps to extract the metadata which are retrieving the DDL, getting detailed metadata about all columns and getting constraints.

1. Get DDL (Data Definition Language)

To retrieve the DDL for a specific table, you can query the INFORMATION_SCHEMA.TABLES view. The DDL includes the SQL statement used to create the table as the following code.

SELECT table_name, ddl
FROM `bigquery-public-data.thelook_ecommerce.INFORMATION_SCHEMA.TABLES`;

Here is the query result:

DDL is used to define the structure and the table name will be used to obtain the necessary information such as the column name and its constraints.

2. Get Metadata

You can query the INFORMATION_SCHEMA.COLUMNS view to obtain details about all columns in a dataset, including column names and data types.

SELECT column_name, data_type 
FROM `bigquery-public-data.thelook_ecommerce.INFORMATION_SCHEMA.COLUMNS` 
WHERE table_name = '{{ table_name }}';

Figure1. the SQL code for retrieving the column names and data types

3. Check Constraints

To get the constraint for the dataset, you can use the following command.

SELECT *
FROM `bigquery-public-data.thelook_ecommerce.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE`;

📌 However, sometimes there're no PF and FK constraints in the dataset (the query shows no data available). To solve this problem for this dataset, we will provide the DDL that describes the constraint for primary and foreign keys.

Before we start, if you are working on a public dataset or a source where you don't have permission to create keys, you can create your own dataset and copy the data to your destination.

Here is how you can alter the table and set the necessary constraints. Firstly, you need to set up the query destination to be saved in your local environment. You can follow this tutorial on YouTube.

Once you have set up the query destination, all the tables need to be saved to your dataset as follows:

4. Create Constraints

Next, you can run the following command to set up the constraints for primary keys and foreign keys:

ALTER TABLE `your_project_name.your_dataset_name.users`
ADD PRIMARY KEY(id) NOT ENFORCED;

ALTER TABLE `your_project_name.your_dataset_name.distribution_centers`
ADD PRIMARY KEY(id) NOT ENFORCED;

ALTER TABLE `your_project_name.your_dataset_name.products`
ADD PRIMARY KEY(id) NOT ENFORCED,
ADD FOREIGN KEY(distribution_center_id) REFERENCES `your_project_name.your_dataset_name.distribution_centers`(id) NOT ENFORCED;

ALTER TABLE `your_project_name.your_dataset_name.inventory_items`
ADD PRIMARY KEY(id) NOT ENFORCED,
ADD FOREIGN KEY(product_distribution_center_id) REFERENCES `your_project_name.your_dataset_name.distribution_centers`(id) NOT ENFORCED,
ADD FOREIGN KEY(product_id) REFERENCES `your_project_name.your_dataset_name.products`(id) NOT ENFORCED;

ALTER TABLE `your_project_name.your_dataset_name.events`
ADD PRIMARY KEY(id) NOT ENFORCED,
ADD FOREIGN KEY(user_id) REFERENCES `your_project_name.your_dataset_name.users`(id) NOT ENFORCED;

ALTER TABLE `your_project_name.your_dataset_name.orders`
ADD PRIMARY KEY(order_id) NOT ENFORCED,
ADD FOREIGN KEY(user_id) REFERENCES `your_project_name.your_dataset_name.users`(id) NOT ENFORCED;

ALTER TABLE `your_project_name.your_dataset_name.order_items`
ADD PRIMARY KEY(id) NOT ENFORCED,
ADD FOREIGN KEY(order_id) REFERENCES `your_project_name.your_dataset_name.orders`(order_id) NOT ENFORCED,
ADD FOREIGN KEY(user_id) REFERENCES `your_project_name.your_dataset_name.users`(id) NOT ENFORCED,
ADD FOREIGN KEY(product_id) REFERENCES `your_project_name.your_dataset_name.products`(id) NOT ENFORCED,
ADD FOREIGN KEY(inventory_item_id) REFERENCES `your_project_name.your_dataset_name.inventory_items`(id) NOT ENFORCED;

Note: you can use Ctrl/Cmd+F to search and replace the value of the variables (your_project_name.your_dataset_name) all in one.

5. Query Constraints

After that, you can visualize the constraint with the following commands:

TABLE_CONSTRAINTS

As per BigQuery's doc: "The TABLE_CONSTRAINTS view contains the primary and foreign key relations in a BigQuery dataset."

SELECT 
  table_name,
  constraint_type,
  constraint_name
FROM `datascalehq.thelook_ecommerce.INFORMATION_SCHEMA.TABLE_CONSTRAINTS`;

KEY_COLUMN_USAGE

As per BigQuery's doc: "The KEY_COLUMN_USAGE contains columns of the tables from TABLE_CONSTRAINTS that are constrained as keys by PRIMARY KEY and FOREIGN KEY constraints."

SELECT
  table_name,
  column_name,
  constraint_name
FROM `datascalehq.thelook_ecommerce.INFORMATION_SCHEMA.KEY_COLUMN_USAGE`;

This image below showed how TABLE_CONSTRAINTS and KEY_COLUMN_USAGE are related.

CONSTRAINT_COLUMN_USAGE

As per BigQuery's doc: "The CONSTRAINT_COLUMN_USAGE view contains all columns used by constraints. For PRIMARY KEY constraints, these are the columns from the KEY_COLUMN_USAGE view. For FOREIGN KEY constraints, these are the columns of the referenced tables."

SELECT
  table_name,
  column_name,
  constraint_name
FROM `datascalehq.thelook_ecommerce.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE`;

This image below showed how CONSTRAINT_COLUMN_USAGE and KEY_COLUMN_USAGE are related.

E.g., Let's look at LINE NO. 15-16 (events.id & events.user_id), they can be mapped into an ER diagram as follows:

6. Put It All Together

SELECT 
  kcu.table_name,
  kcu.column_name,
  kcu.constraint_name,
  ccu.table_name AS referenced_table_name,
  ccu.column_name AS referenced_column_name,
  tc.constraint_type
FROM 
  `datascalehq.thelook_ecommerce.INFORMATION_SCHEMA.KEY_COLUMN_USAGE` kcu
JOIN 
  `datascalehq.thelook_ecommerce.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE` ccu
ON 
  kcu.constraint_name = ccu.constraint_name
JOIN 
  `datascalehq.thelook_ecommerce.INFORMATION_SCHEMA.TABLE_CONSTRAINTS` tc
ON 
  kcu.constraint_name = tc.constraint_name
WHERE 
  tc.constraint_type = 'FOREIGN KEY';

From this query, we can see all the references here:

order_items.order_id -> orders.order_id
order_items.user_id -> users.id
order_items.product_id -> products.id

With these constraints, they allow us to gain the PF & FK references from the DDL which can be used to map out a ER diagram. I.e.,

SELECT table_name, ddl
FROM `bigquery-public-data.thelook_ecommerce.INFORMATION_SCHEMA.TABLES`;

In conclusion, we discussed the challenges of working with a dataset in BigQuery and how extracting metadata using Information Schema can help. We covered how to retrieve DDL, column metadata, and constraints, which are the building blocks for creating an ER diagram.

What's Next?

In Part 2, we will continue by demonstrating how to use the extracted metadata and constraints to create an ER diagram using Datascale. We will guide you through the process of visualizing your dataset structure to have a clear and comprehensive understanding of your data relationships.

Related blogs

Contents

Section

Get a clear view of your SQL dependencies

Datascale helps reverse engineer data models from existing schema and SQL queries

Learn more

Where data gets modeled

Where data gets modeled