Author: Ian
Date Posted: Jan 29, 2024
Last Modified: Nov 29, 2024
Full Load Strategy Medallion Schema - Salesforce example
Full data refresh strategy with a Medallion data architecture, demonstrated using Salesforce accounts.
Three pipelines that demonstrate a full data replication strategy, using a Medallion (bronze/silver/gold) data architecture.
Open the orchestration pipeline qs-full-load-salesforce
first.
To configure the orchestration pipeline, follow Variables (x)
and then the edit icon to set the default value of the SalesforceOAuthName
variable to the name of your own Salesforce OAuth connection.
After updating the default value of the SalesforceOAuthName
variable the pipeline should validate cleanly.
Salesforce Accounts full load
This orchestration Data Productivity Cloud pipeline copies all records from your Salesforce Accounts table.
- Target: bronze layer table
mtln_qs_sf_account
- Granularity: one record per Salesforce Account
Note this table is relatively throwaway: it gets dropped and recreated every time the connector component runs. Furthermore the schema is liable to change according to which columns are chosen in the connector.
The orchestration pipeline ensures that the silver layer ODS table mtln_qs_sf_ods_account
exists, along with a sequence to provide a surrogate key. The ODS table is the permanent replicated copy, so is only created if it does not already exist.
To complete the orchestration, the pipeline runs the bronze-to-silver and the silver-to-gold data transformations. Note that these are identical to the transformation pipelines in the incremental data refresh strategy, thanks to the logic separation that comes from using the medallion data architecture.
Bronze to Silver - Account ODS table
The first Data Productivity Cloud transformation pipeline takes the raw, bronze layer account data and performs some basic transformations to change the data into a more generic operational data store (ODS) schema.
- The Rank and Filter combination ensures there is no duplication in the data
- The Rename component converts the schema from Salesforce Account into a general purpose Account table
- The Table Update performs a
MERGE
into the target, with either anINSERT
or anUPDATE
according to the primary key. If your table does not have a primary key, the full data refresh strategy allows you to use a Table Output in “truncate” output mode instead. - The surrogate key of the ODS table is an alternative primary key, and is generated from the sequence database object
Using a Snowflake database sequence as a surrogate key generator
This is a simple solution, implemented by setting the default value of the surrogate_key
column to "mtln_qs_sf_ods_account_seq".NEXTVAL
in the Create Silver ODS Table
component. After providing a dynamic default value in this way, you can safely omit that column from DML operations. The default will always be applied upon insert.
But sequence objects are known to leave gaps in the generated sequence numbers. This is an unavoidable consequence of the internal performance enhancement of issuing new sequence numbers in batches. Exactly the same applies to AUTOINCREMENT
and IDENTITY
columns.
As an alternative you may consider using a ROW_NUMBER()
function in a Rank component that issues an ordinal number to new rows. This does require at least one extra join back to the source table, so is more sophisticated at the cost of being marginally slower at runtime.
Silver to Gold - Account fact and dimension
The second Data Productivity Cloud transformation pipeline starts with the ODS account table, and transforms it into an easily consumable star schema. It includes an account fact table and an account dimension table.
The dimension table mtln_qs_dim_account
has one record per account, with its surrogate key taken from the ODS account table. It is a simple type 1 dimension
The fact table mtln_qs_fact_account
also contains one record per account. Three foreign key columns are added:
dim_account_sk
- foreign key tomtln_qs_dim_account
dim_date_sk
- foreign key to a date dimension (not implemented in this example)dim_time_sk
- foreign key to a time dimension (not implemented in this example)
How to run the pipeline
Validate and run qs-full-load-salesforce
after setting your Salesforce OAuth as described above.
Run the pipeline again some time later and note how the ODS and star schema tables are updated to show all changes and additions that have been made at source.
Downloads
Licensed under: Matillion Free Subscription License
- Download full-load-salesforce_Snowflake.zip
- Target: Snowflake