Author: Matillion
Date Posted: Nov 24, 2023
Last Modified: Nov 30, 2023
Slowly Changing Dimensions
Example Matillion ETL jobs that maintain Type 1 and Type 2 Slowly Changing Dimension (SCD) tables.
This is a set of related jobs that demonstrate various techniques for maintaining Type 1 and Type 2 SCD tables.
Data Model
The example is a general purpose “event” dimension, where
- Events are uniquely identified by a short natural key (example:
e1
) - All events have a
type
and acategory
Running these jobs will create three tables and one sequence in your default database and schema.
EVENT_DIM_T1
A type 1 “event” dimension, with no history tracking.
Column | Description |
---|---|
event_sk | Surrogate key for the dimension. This would be used as the foreign key on a fact table. Enforced unique. In these examples the value is maintained manually |
event_nk | Natural key of the event |
event_type | The event type |
event_category | The event category |
Like any type 1 dimension, there are two alternaltive unique identifiers
event_sk
- registered as a unique keyevent_nk
- kept unique by the update processing. This is how a fact table would look up against this type 1 dimension
EVENT_DIM_T2
A type 2 “event” dimension, with history tracking.
Column | Description |
---|---|
event_sk | Surrogate key for the dimension. This would be used as the foreign key on a fact table. Enforced unique, and maintained automatically using the SEQ_EVENT_DIM sequence |
event_nk | Natural key of the event |
event_type | The event type |
event_category | The event category |
valid_from | Timestamp this version of the record became valid |
valid_to | Timestamp this version of the record ceased to be valid. Within every event_nk , the combination of valid_from and valid_to should form a continuous, non overlapping range of time |
is_latest | Boolean flag indicating if this version of the record is currently the latest |
version | An ascending integer version number. Within every event_nk , the record with the highest version has is_latest set to TRUE |
Filtering this table on is_latest
= TRUE should produce the same data as the type 1 dimension.
Like any type 2 dimension, there are several unique identifiers
event_sk
- registered as a unique keyevent_nk
+version
- kept unique by the update processingevent_nk
+ a timestamp betweenvalid_from
andvalid_to
- this theta join is how a fact table would look up against this type 2 dimension
STG_EVENT_DIM_EXAMPLE
Used as the data source for the dimension tables. In a medallion data architecture this would be a bronze layer object (two-tier model) or a silver layer object (three tier model).
Column | Description |
---|---|
event_nk | Natural key of the event. Enforced unique |
event_type | The event type |
event_category | The event category |
The Example SCD run
job creates a new set of randomized test data in this table.
How to use the jobs
There are five jobs in the download:
- Example SCD begin setup - run this first to set up the examples by creating the database objects
- Example SCD run - run this repeatedly to send test data through the transformation jobs
- Example SCD EVENT_DIM_T1 - an example transformation job to maintain a Type 1 dimension
- Example SCD EVENT_DIM_T2 (High Code) - a high code example transformation job that maintains a Type 2 dimension
- Example SCD EVENT_DIM_T2 (Low Code) - the low code (more visual) example transformation job that maintains the same Type 2 dimension
Start with Example SCD begin setup
, and run it once. It will create all the database objects. If you run it again a second time, it will re-create them all empty again.
Now run Example SCD run
to send the first set of test data through the transformation jobs. It populates the type 1 and the type 2 dimension tables from the same set of test data. Check the data in the dimension tables by using a SQL client, or the sample tab of one of the Table Input components in the transformation jobs.
For the type 2 dimension, you can switch between the high- and low-code variations by updating the Run Transformation component. The high code version is slightly faster at runtime, but requires hand coding to maintain. The results in the tables are identical whichever one you use.
Run Example SCD run
several times to build up some history in the type 2 dimension.
Take a look at how the transformation jobs work. You are free to copy and customize them for your own needs.
Downloads
Licensed under: Matillion Free Subscription License
- Download METL-sf-1.61.6-slowly-changing-dimensions.json
- Target: Snowflake
- Version: 1.61.6 or higher