Data Productivity Cloud Pipeline

Author: Ian
Date Posted: Jan 29, 2024
Last Modified: Nov 29, 2024

Full Load Strategy Medallion Schema - Salesforce example

Full data refresh strategy with a Medallion data architecture, demonstrated using Salesforce accounts.

Three pipelines that demonstrate a full data replication strategy, using a Medallion (bronze/silver/gold) data architecture.

Open the orchestration pipeline qs-full-load-salesforce first.

Image ofFull Load Strategy Medallion data architecture - orchestration
Full Load Strategy Medallion data architecture - orchestration

To configure the orchestration pipeline, follow Variables (x) and then the edit icon to set the default value of the SalesforceOAuthName variable to the name of your own Salesforce OAuth connection.

Image ofSet pipeline variable default value
Set pipeline variable default value

After updating the default value of the SalesforceOAuthName variable the pipeline should validate cleanly.

Salesforce Accounts full load

This orchestration Data Productivity Cloud pipeline copies all records from your Salesforce Accounts table.

  • Target: bronze layer table mtln_qs_sf_account
  • Granularity: one record per Salesforce Account

Note this table is relatively throwaway: it gets dropped and recreated every time the connector component runs. Furthermore the schema is liable to change according to which columns are chosen in the connector.

The orchestration pipeline ensures that the silver layer ODS table mtln_qs_sf_ods_account exists, along with a sequence to provide a surrogate key. The ODS table is the permanent replicated copy, so is only created if it does not already exist.

To complete the orchestration, the pipeline runs the bronze-to-silver and the silver-to-gold data transformations. Note that these are identical to the transformation pipelines in the incremental data refresh strategy, thanks to the logic separation that comes from using the medallion data architecture.

Bronze to Silver - Account ODS table

The first Data Productivity Cloud transformation pipeline takes the raw, bronze layer account data and performs some basic transformations to change the data into a more generic operational data store (ODS) schema.

Image ofBronze to Silver - Account ODS table
Bronze to Silver - Account ODS table
  • The Rank and Filter combination ensures there is no duplication in the data
  • The Rename component converts the schema from Salesforce Account into a general purpose Account table
  • The Table Update performs a MERGE into the target, with either an INSERT or an UPDATE according to the primary key. If your table does not have a primary key, the full data refresh strategy allows you to use a Table Output in “truncate” output mode instead.
  • The surrogate key of the ODS table is an alternative primary key, and is generated from the sequence database object

Using a Snowflake database sequence as a surrogate key generator

This is a simple solution, implemented by setting the default value of the surrogate_key column to "mtln_qs_sf_ods_account_seq".NEXTVAL in the Create Silver ODS Table component. After providing a dynamic default value in this way, you can safely omit that column from DML operations. The default will always be applied upon insert.

But sequence objects are known to leave gaps in the generated sequence numbers. This is an unavoidable consequence of the internal performance enhancement of issuing new sequence numbers in batches. Exactly the same applies to AUTOINCREMENT and IDENTITY columns.

As an alternative you may consider using a ROW_NUMBER() function in a Rank component that issues an ordinal number to new rows. This does require at least one extra join back to the source table, so is more sophisticated at the cost of being marginally slower at runtime.

Silver to Gold - Account fact and dimension

The second Data Productivity Cloud transformation pipeline starts with the ODS account table, and transforms it into an easily consumable star schema. It includes an account fact table and an account dimension table.

Image ofSilver to Gold - Account fact and dimension tables
Silver to Gold - Account fact and dimension tables

The dimension table mtln_qs_dim_account has one record per account, with its surrogate key taken from the ODS account table. It is a simple type 1 dimension

The fact table mtln_qs_fact_account also contains one record per account. Three foreign key columns are added:

  • dim_account_sk - foreign key to mtln_qs_dim_account
  • dim_date_sk - foreign key to a date dimension (not implemented in this example)
  • dim_time_sk - foreign key to a time dimension (not implemented in this example)

How to run the pipeline

Validate and run qs-full-load-salesforce after setting your Salesforce OAuth as described above.

Run the pipeline again some time later and note how the ODS and star schema tables are updated to show all changes and additions that have been made at source.


Downloads

Licensed under: Matillion Free Subscription License

Installation Instructions

How to Install a Data Productivity Cloud Pipeline