BigQuery Information Schema: Primary Key and Foreign Key
Norapath Arjanurak
data-modeling
Part 1: BigQuery Information Schema: Primary Key and Foreign Key
Part 3: A Guide to Transformational Modeling with A BigQuery Dataset
BigQuery Information Schema
In database system, it is essential to understand the structure and relationships within the dataset. This can be enhanced by creating the lineage or entity-relationship diagram.
This blog series will go through the process of creating the ER diagram from a BigQuery dataset. In Part 1, we will focus on extracting Data Definition Language (DDL), metadata, and constraints using BigQuery's Information Schema.
How to create an ER diagram from a BigQuery dataset?
When working with a dataset, writing the queries can be challenging due to the difficulty of seeing a clear picture of the table structures and relationships. Also, manually exploring metadata can be time-consuming and cause errors. Indeed, the lack of built-in ER diagram generation support in BigQuery makes it hard to visualize the data schema.
To overcome this challenge, we will use BigQuery Information Schema to extract the necessary metadata (table and column names & data types) and constraints (PK/FK). This metadata will be used to create an ERD for visualizing the relationship between entities. The following section will provide the steps to extract the metadata which are retrieving the DDL, getting detailed metadata about all columns and getting constraints.
1. Get DDL (Data Definition Language)
To retrieve the DDL for a specific table, you can query the INFORMATION_SCHEMA.TABLES
view. The DDL includes the SQL statement used to create the table as the following code.
Here is the query result:
DDL is used to define the structure and the table name will be used to obtain the necessary information such as the column name and its constraints.
2. Get Metadata
You can query the INFORMATION_SCHEMA.COLUMNS
view to obtain details about all columns in a dataset, including column names and data types.
Figure1. the SQL code for retrieving the column names and data types
3. Check Constraints
To get the constraint for the dataset, you can use the following command.
📌 However, sometimes there're no PF and FK constraints in the dataset (the query shows no data available). To solve this problem for this dataset, we will provide the DDL that describes the constraint for primary and foreign keys.
Before we start, if you are working on a public dataset or a source where you don't have permission to create keys, you can create your own dataset and copy the data to your destination.
Here is how you can alter the table and set the necessary constraints. Firstly, you need to set up the query destination to be saved in your local environment. You can follow this tutorial on YouTube.
Once you have set up the query destination, all the tables need to be saved to your dataset as follows:
4. Create Constraints
Next, you can run the following command to set up the constraints for primary keys and foreign keys:
Note: you can use Ctrl/Cmd+F to search and replace the value of the variables (your_project_name.your_dataset_name
) all in one.
5. Query Constraints
After that, you can visualize the constraint with the following commands:
TABLE_CONSTRAINTS
As per BigQuery's doc: "The TABLE_CONSTRAINTS
view contains the primary and foreign key relations in a BigQuery dataset."
KEY_COLUMN_USAGE
As per BigQuery's doc: "The KEY_COLUMN_USAGE
contains columns of the tables from TABLE_CONSTRAINTS
that are constrained as keys by PRIMARY KEY
and FOREIGN KEY
constraints."
This image below showed how TABLE_CONSTRAINTS
and KEY_COLUMN_USAGE
are related.
CONSTRAINT_COLUMN_USAGE
As per BigQuery's doc: "The CONSTRAINT_COLUMN_USAGE
view contains all columns used by constraints. For PRIMARY KEY
constraints, these are the columns from the KEY_COLUMN_USAGE
view. For FOREIGN KEY
constraints, these are the columns of the referenced tables."
This image below showed how CONSTRAINT_COLUMN_USAGE
and KEY_COLUMN_USAGE
are related.
E.g., Let's look at LINE NO. 15-16 (events.id
& events.user_id
), they can be mapped into an ER diagram as follows:
6. Put It All Together
From this query, we can see all the references here:
With these constraints, they allow us to gain the PF & FK references from the DDL which can be used to map out a ER diagram. I.e.,
In conclusion, we discussed the challenges of working with a dataset in BigQuery and how extracting metadata using Information Schema can help. We covered how to retrieve DDL, column metadata, and constraints, which are the building blocks for creating an ER diagram.
What's Next?
In Part 2, we will continue by demonstrating how to use the extracted metadata and constraints to create an ER diagram using Datascale. We will guide you through the process of visualizing your dataset structure to have a clear and comprehensive understanding of your data relationships.