Author: Ian
Date Posted: Feb 2, 2024
Last Modified: Nov 29, 2024
Incremental Load Strategy Medallion Schema - Salesforce example
Incremental data refresh strategy with a Medallion data architecture, demonstrated using Salesforce accounts.
Three pipelines that demonstrate an incremental data replication strategy, using a Medallion (bronze/silver/gold) data architecture.
Open the orchestration pipeline qs-incremental-load-salesforce first.

To configure the orchestration pipeline, follow Variables (x) and then the edit icon to set the default value of the SalesforceOAuthName variable to the name of your own Salesforce OAuth connection.

After updating the default value of the SalesforceOAuthName variable the pipeline should validate cleanly.
Salesforce Accounts incremental load
This orchestration Data Productivity Cloud pipeline copies all records from your Salesforce Accounts table that have been updated since the last time the pipeline ran. The first time it runs, all the records will be copied.
- Target: bronze layer table
mtln_qs_sf_account - Granularity: one record per Salesforce Account
Note this table is relatively throwaway: it gets dropped and recreated every time the connector component runs. Furthermore the schema is liable to change according to which columns are chosen in the connector.
The incremental load works in two stages
- Check the maximum
LastModifiedDatethat has already been copied - done by theSet HighWaterMarkcomponent, which udpates theHighWaterMarkpipeline variable - In the Salesforce Query, use the
HighWaterMarkin the Data Source Filter to only request more recently updated records. TheHighWaterMarkvariable is declared as aTexttype to avoid implicit format conversions. Note the explicit format mask in the SQL query inside theSet HighWaterMarkcomponent.
The orchestration pipeline ensures that the silver layer ODS table mtln_qs_sf_ods_account exists, along with a sequence to provide a surrogate key. The ODS table is the permanent replicated copy, so is only created if it does not already exist.
To complete the orchestration, the pipeline runs the bronze-to-silver and the silver-to-gold data transformations. Note that these are identical to the transformation pipelines in the full data refresh strategy, thanks to the logic separation that comes from using the medallion data architecture.
Bronze to Silver - Account ODS table
The first Data Productivity Cloud transformation pipeline takes the raw, bronze layer account data and performs some basic transformations to change the data into a more generic operational data store (ODS) schema.

- The Rank and Filter combination ensures there is no duplication in the data
- The Rename component converts the schema from Salesforce Account into a general purpose Account table
- The Table Update performs a
MERGEinto the target, with either anINSERTor anUPDATEaccording to the primary key. - The surrogate key of the ODS table is an alternative primary key, and is generated from the sequence database object
Prerequisites for using an incremental load strategy
The table must have an always-ascending attribute such as a last-updated timestamp, which the source system updates every time a record is created or modified. Salesforce implements this pattern consistently and reliably. In the absence of a timestamp you may be able to use an ascending integer such as an audit field if it exists.
The table must have a primary key. This allows the Table Update to distinguish between inserts and updates.
Hard deletes are difficult to track using this technique. If a record has been deleted from source it will never appear among the updated records because it doesn’t exist any longer. More discussion on this subject in this article on Understanding Full Load Data Replication.
Using a Snowflake database sequence as a surrogate key generator
This is a simple solution, implemented by setting the default value of the surrogate_key column to "mtln_qs_sf_ods_account_seq".NEXTVAL in the Create Silver ODS Table component. After providing a dynamic default value in this way, you can safely omit that column from DML operations. The default will always be applied upon insert.
But sequence objects are known to leave gaps in the generated sequence numbers. This is an unavoidable consequence of the internal performance enhancement of issuing new sequence numbers in batches. Exactly the same applies to AUTOINCREMENT and IDENTITY columns.
As an alternative you may consider using a ROW_NUMBER() function in a Rank component that issues an ordinal number to new rows. This does require at least one extra join back to the source table, so is more sophisticated at the cost of being marginally slower at runtime.
Silver to Gold - Account fact and dimension
The second Data Productivity Cloud transformation pipeline starts with the ODS account table, and transforms it into an easily consumable star schema. It includes an account fact table and an account dimension table.

The dimension table mtln_qs_dim_account has one record per account, with its surrogate key taken from the ODS account table. It is a simple type 1 dimension
The fact table mtln_qs_fact_account also contains one record per account. Three foreign key columns are added:
dim_account_sk- foreign key tomtln_qs_dim_accountdim_date_sk- foreign key to a date dimension (not implemented in this example)dim_time_sk- foreign key to a time dimension (not implemented in this example)
How to run the pipeline
Validate and run qs-incremental-load-salesforce after setting your Salesforce OAuth as described above.
Run the pipeline again some time later and note how the ODS and star schema tables are updated to show all changes and additions that have been made at source.
Downloads
Licensed under: Matillion Free Subscription License
- Download incremental-load-salesforce_Snowflake.zip
- Target: Snowflake