Author: Matillion
Date Posted: Mar 10, 2025
Last Modified: Mar 12, 2025
Slowly Changing Dimensions (Manual)
Manage Slowly Changing Dimension tables in a Data Productivity Cloud pipeline, including type 1 and type 2 SCDs.
These pipelines maintain Slowly Changing Dimension tables for “event” reference data. Events are identified by an event_nk
natural key, and have an event_type
and an event_category
.
This method of maintaining SCD tables is presented as an alternative to automatically maintaining SCD views.
Start by running Physical SCD begin setup
to create and initialize all the tables.

Afterwards, run Physical SCD run
multiple times to try the pipelines with test data. Every run creates new, random data in the input table STG_EVENT_DIM_EXAMPLE
, which is fed into the dimension tables.

Type 1 Slowly Changing Dimension
A “type 1” SCD table doesn’t record any history of the past values of event_type
or event_category
: it always contains the latest data.
New column added:
Parameter | Description |
---|---|
event_sk | Surrogate key for the dimension. Take a look at these surrogate key techniques for reference |
The Physical SCD EVENT_DIM_T1
pipeline reads from STG_EVENT_DIM_EXAMPLE
and adds or updates any changes into the EVENT_DIM_T1
table.
The target dimension table is also used as an input, for multiple purposes:
- to detect new event records (the ‘C’ indicator meaning ‘changed’)
- to detect updated event records (the ‘N’ indicator meaning ’new’)
- to generate new surrogate key values, with the
New Row Rank
andmax_event_sk
components. This example shows one of many ways to generate surrogate keys

Note that this solution does not consider logical deletes. If one of the events does not appear in the input, the pipeline assumes it has not changed.
Type 2 Slowly Changing Dimension
A “type 2” SCD table records every change to the event_type
or event_category
columns. Every new or changed value generates a new record in the table.
New columns added:
Parameter | Description |
---|---|
event_sk | Surrogate key for the dimension. Same as above, there are more surrogate key techniques available as alternatives |
valid_from | Timestamp when this version of the event became current |
valid_to | Timestamp when this version of the event ceased to be current |
is_latest | Indicates if this version of the event is the latest known |
version | Version number for the event, ascending as changes accumulate |
The Physical SCD EVENT_DIM_T2 (Low Code)
pipeline reads from STG_EVENT_DIM_EXAMPLE
and adds or updates any changes into the EVENT_DIM_T2
table.
In this pipeline the target dimension table is also used as an input, to detect new or updated event records: the ‘C’ or ‘N’ indicator, same as in the type 1 SCD pipeline. Note that the comparison is made against the latest values in the table (where is_latest
is true).
Validity time ranges valid_from
and valid_to
contain the timestamps when each version of the record is (or was) correct:
- There is no time information in the input data, so the timestamp of the pipeline run is used
- The “previous” version of every record is set to expire one second before the next one
- The “current” version of an event expires at a hardcoded date far in the future
The main complexity of maintaining SCD type 2 tables is that changes require DML on two records:
- An update to the previous version of the record, to set
is_latest
attribute to false, and give it avalid_to
timestamp - Insert a new record
Brand new events require only one change:
- Insert a new record

Two more possible scenarios are not considered by this pipeline:
- Logical deletes are ignored, same as the SCD type 1 above. When an event does not appear in the input, the pipeline assumes it has not changed. Handling logical deletes would involve extra processing: for example requiring that all events are expected every time, and closing the validity of any that are not present.
- Historical updates also present an additional complication for manually-maintained type 2 SCDs. Adjustments that need to be backdated involve carefully altering historical record validity ranges in the dimension table.
Type 2 Slowly Changing Dimension (high code version)
On some platforms, the exact same SCD type 2 logic is demonstrated in a high code version of the pipeline.

Note the sophisticated join clauses in the Type 2 SCD logic
which are needed to handle cases where an optional attribute changes from NULL to non-NULL.
Downloads
Licensed under: Matillion Free Subscription License
- Download SCD-Physical-Databricks.zip
- Target: Databricks
- Download SCD-Physical-Redshift.zip
- Platform: AWS
- Target: Redshift
- Download SCD-Physical-Snowflake.zip
- Target: Snowflake