Matillion ETL Job

Author: Matillion
Date Posted: Nov 24, 2023
Last Modified: Nov 30, 2023

Slowly Changing Dimensions

Example Matillion ETL jobs that maintain Type 1 and Type 2 Slowly Changing Dimension (SCD) tables.

Slowly Changing Dimensions

This is a set of related jobs that demonstrate various techniques for maintaining Type 1 and Type 2 SCD tables.

Data Model

The example is a general purpose “event” dimension, where

  • Events are uniquely identified by a short natural key (example: e1)
  • All events have a type and a category

Running these jobs will create three tables and one sequence in your default database and schema.

EVENT_DIM_T1

A type 1 “event” dimension, with no history tracking.

ColumnDescription
event_skSurrogate key for the dimension. This would be used as the foreign key on a fact table. Enforced unique. In these examples the value is maintained manually
event_nkNatural key of the event
event_typeThe event type
event_categoryThe event category

Like any type 1 dimension, there are two alternaltive unique identifiers

  • event_sk - registered as a unique key
  • event_nk - kept unique by the update processing. This is how a fact table would look up against this type 1 dimension

EVENT_DIM_T2

A type 2 “event” dimension, with history tracking.

ColumnDescription
event_skSurrogate key for the dimension. This would be used as the foreign key on a fact table. Enforced unique, and maintained automatically using the SEQ_EVENT_DIM sequence
event_nkNatural key of the event
event_typeThe event type
event_categoryThe event category
valid_fromTimestamp this version of the record became valid
valid_toTimestamp this version of the record ceased to be valid. Within every event_nk, the combination of valid_from and valid_to should form a continuous, non overlapping range of time
is_latestBoolean flag indicating if this version of the record is currently the latest
versionAn ascending integer version number. Within every event_nk, the record with the highest version has is_latest set to TRUE

Filtering this table on is_latest = TRUE should produce the same data as the type 1 dimension.

Like any type 2 dimension, there are several unique identifiers

  • event_sk - registered as a unique key
  • event_nk + version - kept unique by the update processing
  • event_nk + a timestamp between valid_from and valid_to - this theta join is how a fact table would look up against this type 2 dimension

STG_EVENT_DIM_EXAMPLE

Used as the data source for the dimension tables. In a medallion data architecture this would be a bronze layer object (two-tier model) or a silver layer object (three tier model).

ColumnDescription
event_nkNatural key of the event. Enforced unique
event_typeThe event type
event_categoryThe event category

The Example SCD run job creates a new set of randomized test data in this table.

How to use the jobs

There are five jobs in the download:

  • Example SCD begin setup - run this first to set up the examples by creating the database objects
  • Example SCD run - run this repeatedly to send test data through the transformation jobs
  • Example SCD EVENT_DIM_T1 - an example transformation job to maintain a Type 1 dimension
  • Example SCD EVENT_DIM_T2 (High Code) - a high code example transformation job that maintains a Type 2 dimension
  • Example SCD EVENT_DIM_T2 (Low Code) - the low code (more visual) example transformation job that maintains the same Type 2 dimension

Start with Example SCD begin setup, and run it once. It will create all the database objects. If you run it again a second time, it will re-create them all empty again.

Example SCD begin setup

Now run Example SCD run to send the first set of test data through the transformation jobs. It populates the type 1 and the type 2 dimension tables from the same set of test data. Check the data in the dimension tables by using a SQL client, or the sample tab of one of the Table Input components in the transformation jobs.

For the type 2 dimension, you can switch between the high- and low-code variations by updating the Run Transformation component. The high code version is slightly faster at runtime, but requires hand coding to maintain. The results in the tables are identical whichever one you use.

Example SCD run

Run Example SCD run several times to build up some history in the type 2 dimension.

Take a look at how the transformation jobs work. You are free to copy and customize them for your own needs.


Downloads

Licensed under: Matillion Free Subscription License