Author: Richard
Date Posted: Dec 4, 2023
Last Modified: Dec 4, 2023
Dynamic Load and Flatten Semi-Structured Data
Easily load semi-structured data (Parquet / Avro / Orc) into Snowflake tables.
Load Parquet / Avro / Orc data into Snowflake tables from a Matillion ETL orchestration job, without having to manually define table metadata. Uses Snowflake’s ability to infer columns from file(s).
This is ideal for automating table creation from semi-structured data. Particularly useful when data in object storage has been Client-Side-Encrypted as to be able to load the data into Snowflake and flatten it without ever needing to see the data.
Inputs include the details of a Snowflake Stage and a file path pattern in cloud storage, plus the target table name.
The output of INFER_SCHEMA determines the column names for the new table, as a mixture of VARCHAR and VARIANT for all target columns.
Parameters
Parameter | Description |
---|---|
Source Data Snowflake Stage Database | Snowflake Database where source data Stage object is managed |
Source Data Snowflake Stage Schema | Snowflake Schema where source data Stage object is managed |
Source Data Snowflake Stage Type | Either S3 , AZURE_BLOB or GCS |
Source Data Snowflake Stage Name | Name of External Stage in Snowflake pointing at either S3, Azure Blob or GCS |
Source Data File Pattern | Regex to match files in S3, Azure Blob or GCS. Note This property is a pattern on the complete path of the file, and is not just relative to the directory configured in the Snowflake External Stage, i.e. the subfolder containing the object to load must be matched here |
Source Data File Type | Either PARQUET , AVRO or ORC |
Target Database | Snowflake Database where target table and temp objects will be created |
Target Schema | Snowflake Schema where target table and temp objects will be created |
Target Table Name | Name of target table in Snowflake where data will be loaded |
Force Load Old Files | By default, to avoid duplication, files which have already been loaded to Snowflake will not be loaded to the same table. Set to True to reload all files in location |
Delete Schema Inferrance Table | Set to False to keep temp table containing schema inferrance information |
datafile_metadata | Intermediate job grid variable. Leave blank |
target_table_metadata | Intermediate job grid variable. Leave blank |
Prerequisites
You must configure an External Stage in Snowflake that points to the location (S3 / Azure Blob / GCS) where the source data files are contained. You can do this in the Matillion ETL UI by right-clicking on an Environment and selecting Manage Stages.
Downloads
Licensed under: Matillion Free Subscription License
- Download METL-aws-sf-1.68.3-dynamic-load-and-flatten.melt
- Platform: AWS|Azure|GCP
- Target: Snowflake
- Version: 1.68.3 or higher