BigQuery Information Schema: DDL to ER Diagram

Part 2: how to visualize DDL with PK & FK references from BigQuery's information schema into an ER diagram

Part 2: how to visualize DDL with PK & FK references from BigQuery's information schema into an ER diagram

Norapath Arjanurak

Jun 28, 2024

Jun 28, 2024

data-modeling

👋 Hi there, welcome to Datascale: SQL+Data modeling. We help create database diagrams and map out data models from existing schema and SQL queries 🌱. Find us on LinkedIn 🎉

BigQuery DDL to ER diagram
BigQuery DDL to ER diagram
BigQuery DDL to ER diagram

From BigQuery DDL to ER Diagram

In Part 1, we discussed how to extract metadata and constraints from BigQuery datasets using Information Schema. Now, in Part 2, we will focus on how to use this metadata to create an ER diagram. This diagram will help to clearly describe the structure and relationships within the dataset.

Creating Tables and Adding Constraints

From Part 1, we extracted the DDL and the constraints (PK and FK) from BigQuery information schema which will be used to describe the relationships between tables at the column level. The following commands show the DDL that integrates these relationships as primary and foreign keys.

CREATE TABLE `bigquery-public-data.thelook_ecommerce.distribution_centers` (
  id INT64, name STRING, latitude FLOAT64, longitude FLOAT64
);

CREATE TABLE `bigquery-public-data.thelook_ecommerce.events` (
  id INT64, user_id INT64, sequence_number INT64, session_id STRING, created_at TIMESTAMP,
  ip_address STRING, city STRING, state STRING, postal_code STRING, browser STRING, 
  traffic_source STRING, uri STRING, event_type STRING,
  FOREIGN KEY (user_id) REFERENCES bigquery-public-data.thelook_ecommerce.users(id)
);

CREATE TABLE `bigquery-public-data.thelook_ecommerce.inventory_items` (
  id INT64, product_id INT64, created_at TIMESTAMP, sold_at TIMESTAMP, cost FLOAT64, 
  product_category STRING, product_name STRING, product_brand STRING, 
  product_retail_price FLOAT64, product_department STRING, product_sku STRING, 
  product_distribution_center_id INT64,
  FOREIGN KEY (product_distribution_center_id) 
  REFERENCES bigquery-public-data.thelook_ecommerce.distribution_centers(id),
  FOREIGN KEY (product_id) 
  REFERENCES bigquery-public-data.thelook_ecommerce.products(id)
);

CREATE TABLE `bigquery-public-data.thelook_ecommerce.order_items` (
  id INT64, order_id INT64, user_id INT64, product_id INT64, inventory_item_id INT64, 
  status STRING, created_at TIMESTAMP, shipped_at TIMESTAMP, delivered_at TIMESTAMP, 
  returned_at TIMESTAMP, sale_price FLOAT64,
  FOREIGN KEY (order_id) REFERENCES bigquery-public-data.thelook_ecommerce.orders(order_id),
  FOREIGN KEY (user_id) REFERENCES bigquery-public-data.thelook_ecommerce.users(id),
  FOREIGN KEY (product_id) REFERENCES bigquery-public-data.thelook_ecommerce.products(id),
  FOREIGN KEY (inventory_item_id) 
  REFERENCES bigquery-public-data.thelook_ecommerce.inventory_items(id)
);

CREATE TABLE `bigquery-public-data.thelook_ecommerce.orders` (
  order_id INT64, user_id INT64, status STRING, gender STRING, created_at TIMESTAMP, 
  returned_at TIMESTAMP, shipped_at TIMESTAMP, delivered_at TIMESTAMP, num_of_item INT64
);

CREATE TABLE `bigquery-public-data.thelook_ecommerce.products` (
  id INT64, cost FLOAT64, category STRING, name STRING, brand STRING, retail_price FLOAT64, 
  department STRING, sku STRING, distribution_center_id INT64,
  FOREIGN KEY (distribution_center_id) 
  REFERENCES bigquery-public-data.thelook_ecommerce.distribution_centers(id)
);

CREATE TABLE `bigquery-public-data.thelook_ecommerce.users` (
  id INT64, first_name STRING, last_name STRING, email STRING, age INT64, gender STRING, 
  state STRING, street_address STRING, postal_code STRING, city STRING, country STRING, 
  latitude FLOAT64, longitude FLOAT64, traffic_source STRING, created_at TIMESTAMP
);

This command is the DDL retrieved from the result of the query given in Part1 with the added constraints. While BigQuery does not support direct ERD generation, we can use the Datascale platform to easily visualize the DDL of the above query, as shown below.

Thelook e-commerce ER diagram

Datascale provides capabilities to visualize SQL schemas and ERDs. You can use the DDL from the above queries to create an ERD. Following these steps to use the Datascale platform.

  1. Login to Datascale: Login or sign up with a Google account.

  2. Navigate to the SQL models: Go to the SQL models section on the left panel and create a project.

  3. Input DDL Statements: Add a new SQL and paste your SQL code into the workspace. It will automatically save.

  4. Generate ERD: Click on the “Diagram” and then “ER Diagram” on the right panel. It will show the column-level diagram.

Alternatively, you can see the visualization in action from this link: https://datascale.cloud/sqls/bq-schemaddl-2ZHeQ0wfUw4xSE3Xg5BDdguHc8RCewhSaXW1VtWs

DDL to ER diagram using Datascale

In this part, we demonstrated how to define tables and constraints using BigQuery DDL, and how to visualize the resulting schema using the Datascale platform. This visualization describes the column-level relationships, which will be useful during the transformation of data.

What can we continue with this ER diagram?

In Part 3, we will build on the ER diagram from Part 2 to dive into the transformation of data by focusing on the various modeling layers: base, stage, intermediate, and fact/dimension tables.

We will demonstrate how data can be transformed into business insights and highlight the importance of ER diagrams in this process.

Related blogs

Contents

Section

Get a clear view of your SQL dependencies

Datascale helps reverse engineer data models from existing schema and SQL queries

Learn more

Where data gets modeled

Where data gets modeled