Author: Matillion
Date Posted: Mar 11, 2025
Last Modified: Mar 12, 2025
Slowly Changing Dimensions (Automated)
Manage Slowly Changing Dimension views in a Data Productivity Cloud pipeline, including type 1, type 2, type 3 and type 6 SCDs.
These pipelines maintain Slowly Changing Dimension views for “event” reference data. Events are identified by an event_nk
natural key, and have an event_type
and an event_category
.
This method of automatically maintaining SCD views from a single reference table is presented as an alternative to manually maintaining SCD tables.
Start by running Logical SCD begin setup
to create and initialize all the database objects.

Afterwards, run Logical SCD run
multiple times to try the pipelines with test data. Every run creates new, random data in the input table STG_EVENT_DIM_EXAMPLE
, which is added to NORM_EVENT_REF
so it automatically appears in the dimension views.

The backing table NORM_EVENT_REF
This is the only permanent storage, and is used to generate all the different dimension variations.
NORM_EVENT_REF records all available information about the event dimension, and the timestamp when data was recorded. The columns are:
Parameter | Description |
---|---|
event_sk | Surrogate key column, automatically generated, and used for the surrogate key columns in the dimensions. Take a look at these surrogate key techniques for reference |
event_nk | The natural key of the event |
event_type | An attribute of “events”, that may change over time |
event_category | Another attribute of “events”, that also may change over time |
as_at | Timestamp when this version of the event was recorded |
Storing data in the backing table is a fast, append-only operation. You may choose to add a filter to avoid storing duplicate data.

Type 1 Slowly Changing Dimension
A “type 1” SCD doesn’t record any history of the past values of event_type
or event_category
: it always contains the latest data.
Deriving this from NORM_EVENT_REF
is done by filtering out the latest record (by as_at
timestamp) for every event_nk
. The Data Productivity Cloud components are a Rank and Filter combination.

Type 2 Slowly Changing Dimension
A “type 2” SCD records every change to the event_type
or event_category
columns. Every new or changed value generates a new record.
This is the simplest SCD type to derive from the backing table, because the granularity is already correct. No filtering is needed. The time variance metadata columns are derived like this:
- valid_from - just the
as_at
timestamp - valid_to - the
as_at
timestamp of the following record with the sameevent_nk
using a LEAD analytic function. If there is none, use a hardcoded date far in the future - is_latest - true if this record has the highest
as_at
timestamp for itsevent_nk
, using a ROW_NUMBER analytic function - version - an ascending integer sequence per
event_nk
, set with another ROW_NUMBER analytic function

Type 3 Slowly Changing Dimension
A “type 3” SCD is identical to a type 2, with the addition of “prior” values of event_type
and event_category
. The new columns are:
- prior_event_type - the
event_type
of the previous record with the sameevent_nk
calculated using a LAG analytic function - prior_event_category - same, but for
event_category
These extra columns bring the useful ability for reports to quickly compare against the “previous” situation after a wide-ranging change. A common example is reporting before and after changes to sales region boundaries.

Type 6 Slowly Changing Dimension
A “type 6” SCD combines the history tracking of a type 2, with the addition of the “current” values of all attributes taken from the type 1 dimension. This is done using a Join component between the two logic branches. This dimension type is harder to implement as a real table, because every update also requires an update to the “current” values of all prior versions of the record.
The new columns are:
- current_event_type - the
event_type
from the type 1 dimension - current_event_category - same, but for
event_category
A type 6 SCD is a good default option, because it gives users a very simple choice between using the “as-was” attributes (correct at the time of the transaction) and the “as-is” attributes (correct right now). No decision ever needs to be made about which attributes are history-tracked.

SCD types 4 and 5
These are not well defined, and are not implemented among these examples. SCD types 4 and 5 are usually interpreted as deploying two database objects per dimension. Either adding a mini-dimension just for volatile attributes, or having one “current” and one “historical” dimension. Sometimes this includes giving fact tables multiple foreign keys to the dimension: for example one for the current attributes and one for the prior.
Logical deletes and updates out of sequence
These pipelines ignore the possibility of logical deletes. When an event does not appear in the input, the pipeline assumes it has not changed. Handling logical deletes would involve extra processing: for example requiring that all events are expected every time, and closing the validity of any that are not present.
Historical updates - sometimes known as updates out-of-sequence - occur when an earlier update to the dimension is at first omitted for some reason; perhaps an artificial historical adjustment. Adding these changes is no problem: it just means adding a new record to NORM_EVENT_REF
with an as_at
timestamp that falls somewhere in the past. The logic of the dimension views immediately takes care of adjusting all the validity ranges, current-flag and version numbers. The business may require updates to any related fact tables, since transactions that occurred in the time of the historical change may now link to the wrong dimension record.
Adding fixed “unknown” values
Most dimensions always include one or more records with a fixed surrogate key, for cases where the actual dimension can’t be linked. This usefully allows fact tables to always inner join to all dimensions without losing records. In reports, this manifests as transaction fact records with no “event” information showing “unknown” event_type and event_category, rather than being missing from the report.
The usual convention is that -1 represents “unknown”. This value can be hardcoded in pipelines that create fact table records when the dimension lookup fails.
The fixed records can be introduced with a Fixed Flow and a Unite component in UNION ALL
mode (i.e. with “Remove Duplicates” set to No
) immediately before the view is defined.

Performance considerations
If your dimensions are large, with either many columns or many records, you may consider materializing them as tables as the final part of an ETL run. This exchanges runtime compute for marginally more storage consumed.
Downloads
Licensed under: Matillion Free Subscription License
- Download SCD-Logical-Databricks.zip
- Target: Databricks
- Download SCD-Logical-Redshift.zip
- Platform: AWS
- Target: Redshift
- Download SCD-Logical-Snowflake.zip
- Target: Snowflake