Author: Matillion
Date Posted: Jun 17, 2024
Last Modified: Jun 21, 2024
Olympic Athlete Data Analysis
Analyze historical Olympic athlete data with this set of Data Productivity Cloud pipelines.
These pipelines find the most successful athletes, by medal count, in a historical Olympic Games of your choice.
Load Historical Olympic Athlete Data
Two datafiles are used:
s3://devrel.matillion.com/data/structured/olympics/athlete_events.csv.gz
- all the athletes in every Olympic event, as a CSV separated by \r\n, with one header lines3://devrel.matillion.com/data/structured/olympics/noc_regions.csv.gz
- the National Olympic Committee country codes, as a CSV separated by \r, with one header line
The pipeline Olympic Data - Extract Load
loads these two datafiles into database tables olympic_athletes
and noc_region
.
Top 10 athletes from the 2016 Summer Games
The pipeline Olympic Data - Transform
(in the screenshot at the top) performs the data analysis:
Athletes by country
joins the two tables on the commonNOC
columnFilter Medal Winners
filters only medal winners from the Summer 2016 games
Calculate Medal Points
is an expression that allocates 3 points for a gold medal, 2 for a silver and 1 for a bronze- The
Aggregate
component groups the results by athlete, summing their medal points Rank Athletes
performs a RANK() windowing function to rank the athletes by total medal points descendingTop 10 Athletes
filters only the top 10 athletes by rank
The top 10 athletes are saved into a table named top_10_athletes_summer_2016
Generate bio using OpenAI, Amazon Bedrock or Cortex Completions
Having saved the top 10 athletes into a new database table, you can use generative AI to create a short bio for them. The Large Language Model (LLM) prompt in these examples is as follows:
You are a reporter on athletic events. Write a short bio on how the named athlete performed in the Summer 2016 Olympic Games
All Data Productivity Cloud users have the choices including OpenAI and Amazon Bedrock:
- To use OpenAI, save your API Key as a secret named
OpenAI-Key
- To use Amazon Bedrock, ensure that your Agent has IAM permission for Bedrock, and that you have requested access to the foundation model you intend to use
Snowflake users can additionally choose Cortex Completions for the same purpose, directly within the transformation pipeline:
Downloads
Licensed under: Matillion Free Subscription License
- Download Olympic-Athlete-Data-Analysis-Redshift.zip
- Target: Redshift
- Download Olympic-Athlete-Data-Analysis-Snowflake.zip
- Target: Snowflake