Matillion ETL Shared Job

Author: Matillion
Date Posted: Nov 17, 2023
Last Modified: Nov 17, 2023

Table Data Profiler

Automated column level data profiling for any table.

This shared job allows you to profile the data in any table, from a Matillion ETL orchestration job.

Table Data Profiler

The information is captured into a table named metl_profiling_data, which will be created if necessary. Columns include:

  • schema_name
  • table_name
  • collection_date
  • metric
  • value

Several metrics are collected for every column, depending on the datatype. Built-in metrics include count, count distinct, minimum and maximum values, and the number of missing (NULL) values. For numeric columns the sum, mean, standard deviation and median values are collected per column, along with the upper and lower interquartile ranges.

Profiling Data

Run this job every day to build up profiling metrics over time. You will then be able to detect trends. This can be helpful for data quality monitoring, preemptive error detection, and for runtime prediction.

For tables with many rows, a random sample of one million records is evaluated.

You can find much more information in this article on Data Profiling using Matillion Grid Variables.

Parameters

ParameterDescription
Table NameName of table to be profiled
Schema NameSchema Name, or ${environment_default_schema} to use the environment default
Value LengthThe maximum length that will be recorded in the MIN and MAX metrics. Applies to long string columns

Downloads

Licensed under: Matillion Free Subscription License

Installation Instructions

How to Install a Matillion ETL Shared Job