The Challenge:

Imagine you have a JSON file, sourced from an API response, that logs customer transactions. Each transaction record is dense — with nested data detailing items, service charges, discounts, and more. Your goal is to structure this data into a relational format within BigQuery — preserving the intricate details without losing the overarching context of each transaction. You could modify your python script to unnest these components into their own individual dataframes, but you’d prefer to have the progression from raw to structured data take place in a manner that is easy to audit and to trace in your Data Architecture. This is a perfect use-case for dbt.

The Solution — Step-by-Step Unnesting with dbt and BigQuery:

Step 1 — Create a Raw Data Table BigQuery: Begin by creating a new table in your database that contains the raw JSON response. Don’t worry about renaming columns or even fixing data types at this point — all of that can be handled within dbt.

The raw dataframe after being imported to BigQuery
WITH base AS (
  SELECT
    order_id,
    JSON_EXTRACT(item, '$') AS item_json
  FROM
    {{ source('your_dataset', 'raw_orders') }}
),
unnested_items AS (
  SELECT
    base.order_id,
    JSON_EXTRACT_SCALAR(item, '$.item_id') AS item_id,
    JSON_EXTRACT_SCALAR(item, '$.quantity') AS quantity,
    JSON_EXTRACT_SCALAR(item, '$.price') AS individual_item_price,
    -- Keeping other nested fields as is for now
    JSON_EXTRACT(item, '$.service_charge') AS service_charge_nested,
    JSON_EXTRACT(item, '$.discount') AS discount_nested,
  FROM
    base,
    UNNEST(JSON_EXTRACT_ARRAY(base.item_json, '$')) AS item
)

SELECT * FROM unnested_items

Other tips to keep in mind

  1. Make Sure Your Nested JSON Data is Actually in a Readable JSON Format: You might need to add a few additional lines of code when creating your dataframe (assuming you’re extracting this info from an API) to ensure that the nested JSON is actually formatted properly. If it is not, the unnesting process we walked through above won’t work
  2. Be Intentional About How Far You Want the Unnesting to Go: In working with these JSON responses, you may run into instances with recursively nested child items that go many layers deep. While you can certainly untangle this thread all the way until the end, chances are it might not be entirely necessary. Do some investigation into the nature of these child items — are these additionally nested layers adding value to your analysis? Do they contain amounts that are not also summarized by the parent item, or are they redundant?
  3. Brush up on your dbt documentation: If you’re new to dbt, or if you haven’t used the tool in a while take a moment to brush up on all the how-tos and best practices. This link is a great place to start!