Data Productivity Cloud Pipeline

Author: Ian Funnell
Date Posted: Jan 27, 2025
Last Modified: Jan 27, 2025

Load Staged Table CSV Schema Inference

Create and load a new database table from a staged CSV file, using schema inference.

Begin by copying your CSV file into staging, for example using the PUT command.

In a Data Productivity Cloud orchestration pipeline, create three components:

  • Create or Replace a new File Format appropriate for your CSV file
  • Run Load Staged CSV Schema Inference (this pipeline)
  • Continue on to transform and integrate the newly loaded data
Image ofLoad Staged Table using CSV Schema Inference
Load Staged Table using CSV Schema Inference

Successful schema inference requires that your CSV file has a header as line 1, for example like this:

iata,airport,city,country,lat,long,state
ABQ,Albuquerque International,Albuquerque,USA,35.04022222,-106.60919440,NM
ALB,Albany Cty,Albany,USA,42.74811944,-73.80297861,NY
etc

Parameters

ParameterDescription
pv_file_format_nameName of your File Format (must be in UPPERCASE)
pv_source_pathPath to your staged file, e.g. @~/csv/myfile.csv.gz
pv_staging_table_nameName of the staging table to create

Snowflake schema inference

Snowflake makes schema inference capabilities available through the INFER_SCHEMA function.

Your CSV file must have a header in line 1. Set SKIP_HEADER=1 in your FILE FORMAT.

Image ofSet Skip Header to 1
Set Skip Header to 1

This pipeline runs the following steps:

  • Set PARSE_HEADER=true in the File Format
  • Create or Replace the staging table, with schema inferred from the CSV file
  • Load the data from the CSV file into the newly created table
  • Reset the File Format

Downloads

Licensed under: Matillion Free Subscription License

Installation Instructions

How to Install a Data Productivity Cloud Pipeline