Matillion ETL Shared Job

Read S3 Metadata

Read metadata from one file in a bucket in AWS S3 cloud storage.

Read S3 Metadata

The following items of metadata are read:

  • ContentLength
  • LastModified
  • ETag
  • ContentType

The shared job will fail if the bucket is not known, if the file does not exist, or if the AWS identity lacks the privilege to read metadata from S3 objects.

Parameters

ParameterDescription
Access KeyOptional AWS Access Key. Leave blank or set to - to authenticate using EC2 instance credentials (preferred)
Secret Access KeyOptional AWS Secret Access Key. Leave blank or set to - to authenticate using EC2 instance credentials (preferred)
Bucket NameThe S3 bucket name (do not include the s3:// prefix, do not include the object path)
Object NameThe S3 object name (including path)
MetadataA grid variable with four string columns: ContentLength, LastModified, ETag, ContentType. Use the Grid Export tab to map this onto a grid variable in your own Orchestration Job

Retrieving the metadata

Create a Grid Variable with four text columns in your Orchestration job.

Grid Variable With Four Columns

Do not add any Default Values.

In the Grid Export tab of the Read S3 Metadata shared job, press the + button and map the four columns from gv_metadata into your own grid variable.

Grid Variable Export Mapping

After the Shared Job has run successfully, the four items of metadata are exported to your own grid variable.

Using instance credentials on AWS

When hosted on AWS, give the EC2 instance the permission to use S3. Refer to the Prerequisites section for more information. This Shared Job will inherit the privileges. Set the Access Key and Secret Access Key parameters to a single dash, or leave them blank.

Read S3 Metadata on AWS

Usage on Azure or GCP

When hosted on Azure or GCP, permissions can not be inherited. You must supply an Access Key and a Secret Access Key.

Read S3 Metadata on Azure or GCP

Prerequisites

This shared job requires Python 3.8.

To avoid a ModuleNotFoundError, the following Python libraries must be available:

  • boto3

When running on AWS, permissions can be inherited from the Matillion ETL instance. This is preferred. Set the Access Key and Secret Access Key parameters to a single dash, or leave them blank in this case. Ensure that the EC2 instance credentials attached to your Matillion ETL instance include the privilege to read from S3. For more information, refer to the “IAM in AWS” section in this article on RBAC in the Cloud.

When running on other platforms, authentication is done using AWS access keys for programmatic access to AWS. Supply both the Access Key and the Secret Access Key parameters.


Downloads

Licensed under: Matillion Free Subscription License

Download read-s3-metadata.melt

  • Target: Any target cloud data platform
  • Version: 1.68.3 or higher

Installation Instructions

How to Install a Matillion ETL Shared Job
Author: Matillion
Date Posted: Oct 26, 2023
Last Modified: Nov 22, 2023