Data Productivity Cloud Pipeline

Transpose or pivot data between wide and narrow representations.

Three Data Productivity Cloud pipelines that demonstrate techniques for transposing (pivoting) data between wide and narrow representations.

Open example transpose extract and load first.

Example setup

Image ofData Transposing / Pivoting Orchestration
Data Transposing / Pivoting Orchestration

When you run this pipeline, it will copy (colocate) the sample data from our recent assessment of the UK’s Electric Vehicle infrastructure into a new table named stg_batteryelectric_wide. This is a wide table:

MakeModel2022 Q32022 Q22022 Q12021 Q42010 Q22010 Q1
AIXAMAIXAM MEGA5555155154
ASTON MARTINASTON MARTIN MODEL MISSING[low][low][c][low][low][low]
etc

The data contains some values that are not numeric, either because they were estimated, or were confidential. Also there are values such as 18,974 which are formatted strings rather than real numbers. These will be problematic when it comes to summing the values later. In other words the data is accurate and of high quality, but it needs to be made more consumable so it can be manipulated and interpreted more easily.

There are more than 50 of these “value” columns. So to avoid a lot of hand coding, and therefore to improve productivity, the pipeline automatically sets several variables for the UNPOVOT and PIVOT transformations that will follow.

Data Transposing Wide to Narrow

The first of the two transformation pipelines transposes the data into its narrow format, and creates a new table named stg_batteryelectric_narrow:

Image ofData Transposing Wide to Narrow
Data Transposing Wide to Narrow

In its narrow format the table retains the two identity columns, but all the values are transposed into date / count pairs like this:

MakeModelQuarterVehicles
AIXAMAIXAM MEGA2022 Q35
AIXAMAIXAM MEGA2022 Q25
AIXAMAIXAM MEGA2022 Q15
AIXAMAIXAM MEGA2021 Q45
AIXAMAIXAM MEGA2021 Q26
VOLKSWAGENVOLKSWAGEN UP2014 Q390
VOLKSWAGENVOLKSWAGEN UP2014 Q260
VOLKSWAGENVOLKSWAGEN UP2014 Q154
VOLKSWAGENVOLKSWAGEN UP2013 Q430

It’s exactly the same data, but expressed using more rows and less columns. An important DataOps consideration is that this narrow format offers some protection against schema drift. Adding a new quarter will not require a modification to this table’s structure.

The calculator, filter and rename components remove values such as [low] and [c] that are not numbers. They will be replaced with NULL, which will act as a zero when summed. Strings that look like numbers are replaced by actual numbers.

Data Transposing Narrow to Wide

The second of the two transformation pipelines starts with the narrow format data and pivots it back into wide format at two different granularities:

Image ofData Transposing Narrow to Wide
Data Transposing Narrow to Wide

The new stg_batteryelectric_make_model table is almost identical to the original data. The only differences are that the non-numeric values (like [low] and [c]) are gone, and all the value columns are numbers rather than strings.

MakeModelCOUNT_2021_Q3COUNT_2021_Q2
AUDIAUDI E-TRON1259510422
AUDIAUDI Q41414252
BMWBMW I31389513054
BMWBMW I4
BMWBMW IX19
BMWBMW IX3862
BMWBMW MODEL MISSING98

In the two PIVOT components you’ll notice the expression for the new cells is SUM("Vehicles"). That means it’s possible to aggregate, or sum, the values at a lower granularity. Removing the Model column results in the new stg_batteryelectric_make table:

MakeCOUNT_2021_Q3COUNT_2021_Q2
AUDI1400910674
BMW1478513062

Downloads

Licensed under: Matillion Free Subscription License

Download example transpose extract and load.orch.yaml

Download example transpose transformation 1 wide to narrow.tran.yaml

Download example transpose transformation 2 narrow to wide.tran.yaml

Installation instructions

How to Install a Data Productivity Cloud Pipeline

Author: Matillion