Data Productivity Cloud Pipeline

Author: Ian Funnell
Date Posted: Jan 27, 2025
Last Modified: Mar 14, 2025

Load Staged Table CSV Schema Inference

Create and load a new database table from a staged CSV file, using schema inference.

Begin by copying your CSV file into staging, for example using the PUT command.

In a Data Productivity Cloud orchestration pipeline, create three components:

Create or Replace a new File Format appropriate for your CSV file
Run Load Staged CSV Schema Inference (this pipeline)
Continue on to transform and integrate the newly loaded data

Image ofLoad Staged Table using CSV Schema Inference — Load Staged Table using CSV Schema Inference

Successful schema inference requires that your CSV file has a header as line 1, for example like this:

iata,airport,city,country,lat,long,state
ABQ,Albuquerque International,Albuquerque,USA,35.04022222,-106.60919440,NM
ALB,Albany Cty,Albany,USA,42.74811944,-73.80297861,NY
etc

Parameters

Parameter	Description
pv_file_format_name	Name of your File Format (must be in UPPERCASE)
pv_source_path	Path to your staged file, e.g. `@~/csv/myfile.csv.gz`
pv_staging_table_name	Name of the staging table to create

Snowflake schema inference

Snowflake makes schema inference capabilities available through the INFER_SCHEMA function.

Your CSV file must have a header in line 1. Set SKIP_HEADER=1 in your FILE FORMAT.

Image ofSet Skip Header to 1 — Set Skip Header to 1

This pipeline runs the following steps:

Set PARSE_HEADER=true in the File Format
Create or Replace the staging table, with schema inferred from the CSV file
Load the data from the CSV file into the newly created table
Reset the File Format

Downloads

Licensed under: Matillion Free Subscription License

Download SF-Staged-CSV-Schema-Inference.zip
- Target: Snowflake

Installation Instructions

How to Install a Data Productivity Cloud Pipeline