Data Productivity Cloud Pipeline

Author: Matillion
Date Posted: Mar 10, 2025
Last Modified: Mar 12, 2025

Slowly Changing Dimensions (Manual)

Manage Slowly Changing Dimension tables in a Data Productivity Cloud pipeline, including type 1 and type 2 SCDs.

These pipelines maintain Slowly Changing Dimension tables for “event” reference data. Events are identified by an event_nk natural key, and have an event_type and an event_category.

This method of maintaining SCD tables is presented as an alternative to automatically maintaining SCD views.

Start by running Physical SCD begin setup to create and initialize all the tables.

Image ofSlowly Changing Dimension Tables
Slowly Changing Dimension Tables

Afterwards, run Physical SCD run multiple times to try the pipelines with test data. Every run creates new, random data in the input table STG_EVENT_DIM_EXAMPLE, which is fed into the dimension tables.

Image ofRun test data through the Slowly Changing Dimension tables
Run test data through the Slowly Changing Dimension tables

Type 1 Slowly Changing Dimension

A “type 1” SCD table doesn’t record any history of the past values of event_type or event_category: it always contains the latest data.

New column added:

ParameterDescription
event_skSurrogate key for the dimension. Take a look at these surrogate key techniques for reference

The Physical SCD EVENT_DIM_T1 pipeline reads from STG_EVENT_DIM_EXAMPLE and adds or updates any changes into the EVENT_DIM_T1 table.

The target dimension table is also used as an input, for multiple purposes:

  • to detect new event records (the ‘C’ indicator meaning ‘changed’)
  • to detect updated event records (the ‘N’ indicator meaning ’new’)
  • to generate new surrogate key values, with the New Row Rank and max_event_sk components. This example shows one of many ways to generate surrogate keys
Image ofMaintain a SCD type 1 table
Maintain a SCD type 1 table

Note that this solution does not consider logical deletes. If one of the events does not appear in the input, the pipeline assumes it has not changed.

Type 2 Slowly Changing Dimension

A “type 2” SCD table records every change to the event_type or event_category columns. Every new or changed value generates a new record in the table.

New columns added:

ParameterDescription
event_skSurrogate key for the dimension. Same as above, there are more surrogate key techniques available as alternatives
valid_fromTimestamp when this version of the event became current
valid_toTimestamp when this version of the event ceased to be current
is_latestIndicates if this version of the event is the latest known
versionVersion number for the event, ascending as changes accumulate

The Physical SCD EVENT_DIM_T2 (Low Code) pipeline reads from STG_EVENT_DIM_EXAMPLE and adds or updates any changes into the EVENT_DIM_T2 table.

In this pipeline the target dimension table is also used as an input, to detect new or updated event records: the ‘C’ or ‘N’ indicator, same as in the type 1 SCD pipeline. Note that the comparison is made against the latest values in the table (where is_latest is true).

Validity time ranges valid_from and valid_to contain the timestamps when each version of the record is (or was) correct:

  • There is no time information in the input data, so the timestamp of the pipeline run is used
  • The “previous” version of every record is set to expire one second before the next one
  • The “current” version of an event expires at a hardcoded date far in the future

The main complexity of maintaining SCD type 2 tables is that changes require DML on two records:

  • An update to the previous version of the record, to set is_latest attribute to false, and give it a valid_to timestamp
  • Insert a new record

Brand new events require only one change:

  • Insert a new record
Image ofMaintain a SCD type 1 table (low code)
Maintain a SCD type 1 table (low code)

Two more possible scenarios are not considered by this pipeline:

  • Logical deletes are ignored, same as the SCD type 1 above. When an event does not appear in the input, the pipeline assumes it has not changed. Handling logical deletes would involve extra processing: for example requiring that all events are expected every time, and closing the validity of any that are not present.
  • Historical updates also present an additional complication for manually-maintained type 2 SCDs. Adjustments that need to be backdated involve carefully altering historical record validity ranges in the dimension table.

Type 2 Slowly Changing Dimension (high code version)

On some platforms, the exact same SCD type 2 logic is demonstrated in a high code version of the pipeline.

Image ofMaintain a SCD type 1 table (high code)
Maintain a SCD type 1 table (high code)

Note the sophisticated join clauses in the Type 2 SCD logic which are needed to handle cases where an optional attribute changes from NULL to non-NULL.


Downloads

Licensed under: Matillion Free Subscription License

Installation Instructions

How to Install a Data Productivity Cloud Pipeline