Matillion ETL Shared Job

Author: Richard
Date Posted: Dec 4, 2023
Last Modified: Dec 4, 2023

Dynamic Load and Flatten Semi-Structured Data

Easily load semi-structured data (Parquet / Avro / Orc) into Snowflake tables.

Load Parquet / Avro / Orc data into Snowflake tables from a Matillion ETL orchestration job, without having to manually define table metadata. Uses Snowflake’s ability to infer columns from file(s).

This is ideal for automating table creation from semi-structured data. Particularly useful when data in object storage has been Client-Side-Encrypted as to be able to load the data into Snowflake and flatten it without ever needing to see the data.

Dynamic Load and Flatten

Inputs include the details of a Snowflake Stage and a file path pattern in cloud storage, plus the target table name.

The output of INFER_SCHEMA determines the column names for the new table, as a mixture of VARCHAR and VARIANT for all target columns.

Parameters

ParameterDescription
Source Data Snowflake Stage DatabaseSnowflake Database where source data Stage object is managed
Source Data Snowflake Stage SchemaSnowflake Schema where source data Stage object is managed
Source Data Snowflake Stage TypeEither S3, AZURE_BLOB or GCS
Source Data Snowflake Stage NameName of External Stage in Snowflake pointing at either S3, Azure Blob or GCS
Source Data File PatternRegex to match files in S3, Azure Blob or GCS. Note This property is a pattern on the complete path of the file, and is not just relative to the directory configured in the Snowflake External Stage, i.e. the subfolder containing the object to load must be matched here
Source Data File TypeEither PARQUET, AVRO or ORC
Target DatabaseSnowflake Database where target table and temp objects will be created
Target SchemaSnowflake Schema where target table and temp objects will be created
Target Table NameName of target table in Snowflake where data will be loaded
Force Load Old FilesBy default, to avoid duplication, files which have already been loaded to Snowflake will not be loaded to the same table. Set to True to reload all files in location
Delete Schema Inferrance TableSet to False to keep temp table containing schema inferrance information
datafile_metadataIntermediate job grid variable. Leave blank
target_table_metadataIntermediate job grid variable. Leave blank

Prerequisites

You must configure an External Stage in Snowflake that points to the location (S3 / Azure Blob / GCS) where the source data files are contained. You can do this in the Matillion ETL UI by right-clicking on an Environment and selecting Manage Stages.


Downloads

Licensed under: Matillion Free Subscription License

Installation Instructions

How to Install a Matillion ETL Shared Job