Author: Matillion
Date Posted: Jun 17, 2024
Last Modified: Dec 11, 2024
Olympic Athlete Data Analysis
Analyze historical Olympic athlete data with this set of Data Productivity Cloud pipelines.
These pipelines find the most successful athletes, by medal count, in a historical Olympic Games of your choice.
Load Historical Olympic Athlete Data
Two datafiles are used:
s3://devrel.matillion.com/data/structured/olympics/athlete_events.csv.gz
- all the athletes in every Olympic event, as a CSV separated by \r\n, with one header lines3://devrel.matillion.com/data/structured/olympics/noc_regions.csv.gz
- the National Olympic Committee country codes, as a CSV separated by \r, with one header line
The pipeline Olympic Data - Extract Load
loads these two datafiles into database tables olympic_athletes
and noc_region
.
Top 10 athletes from the 2016 Summer Games
The pipeline Olympic Data - Transform
(in the screenshot at the top) performs the data analysis:
Athletes by country
joins the two tables on the commonNOC
columnFilter Medal Winners
filters only medal winners from the Summer 2016 games
Calculate Medal Points
is an expression that allocates 3 points for a gold medal, 2 for a silver and 1 for a bronze- The
Aggregate
component groups the results by athlete, summing their medal points Rank Athletes
performs a RANK() windowing function to rank the athletes by total medal points descendingTop 10 Athletes
filters only the top 10 athletes by rank
The top 10 athletes are saved into a table named top_10_athletes_summer_2016
Generate bio using OpenAI, Amazon Bedrock or Cortex Completions
Having saved the top 10 athletes into a new database table, you can use generative AI to create a short bio for them. The Large Language Model (LLM) prompt in these examples is as follows:
You are a reporter on athletic events. Write a short bio on how the named athlete performed in the Summer 2016 Olympic Games
All Data Productivity Cloud users have the choices including OpenAI and Amazon Bedrock:
- To use OpenAI, save your API Key as a secret named
OpenAI-Key
- To use Amazon Bedrock, ensure that your Agent has IAM permission for Bedrock, and that you have requested access to the foundation model you intend to use
Snowflake users can additionally choose Cortex Completions for the same purpose, directly within the transformation pipeline:
Lab Helpers
If you are following one of Matillion’s online labs, use the following to copy and paste:
Columns for olympic_athletes
ID VARCHAR Yes No
Name VARCHAR Yes No
Sex VARCHAR Yes No
Age VARCHAR Yes No
Height VARCHAR Yes No
Weight VARCHAR Yes No
Team VARCHAR Yes No
NOC VARCHAR Yes No
Games VARCHAR Yes No
Year VARCHAR Yes No
Season VARCHAR Yes No
City VARCHAR Yes No
Sport VARCHAR Yes No
Event VARCHAR Yes No
Medal VARCHAR Yes No
Columns for noc_region
NOC VARCHAR Yes No
Region VARCHAR Yes No
Notes VARCHAR No No
Column Mapping in the join between olympic_athletes
and noc_region
noc.NOC NOC
noc.Name noc_Name
ath.ID ID
ath.Name Name
ath.Sex Sex
ath.Age Age
ath.Height Height
ath.Weight Weight
ath.Team Team
ath.Games Games
ath.Year Year
ath.Season Season
ath.City City
ath.Sport Sport
ath.Event Event
ath.Medal Medal
Downloads
Licensed under: Matillion Free Subscription License
- Download Olympic-Athlete-Data-Analysis_Redshift.zip
- Target: Redshift
- Download Olympic-Athlete-Data-Analysis_Snowflake.zip
- Target: Snowflake