Author: Matillion
Date Posted: Oct 26, 2023
Last Modified: Nov 22, 2023
Read S3 Metadata
Read metadata from one file in a bucket in AWS S3 cloud storage.
The following items of metadata are read:
- ContentLength
- LastModified
- ETag
- ContentType
The shared job will fail if the bucket is not known, if the file does not exist, or if the AWS identity lacks the privilege to read metadata from S3 objects.
Parameters
Parameter | Description |
---|---|
Access Key | Optional AWS Access Key. Leave blank or set to - to authenticate using EC2 instance credentials (preferred) |
Secret Access Key | Optional AWS Secret Access Key. Leave blank or set to - to authenticate using EC2 instance credentials (preferred) |
Bucket Name | The S3 bucket name (do not include the s3:// prefix, do not include the object path) |
Object Name | The S3 object name (including path) |
Metadata | A grid variable with four string columns: ContentLength, LastModified, ETag, ContentType. Use the Grid Export tab to map this onto a grid variable in your own Orchestration Job |
Retrieving the metadata
Create a Grid Variable with four text columns in your Orchestration job.
Do not add any Default Values.
In the Grid Export tab of the Read S3 Metadata shared job, press the + button and map the four columns from gv_metadata
into your own grid variable.
After the Shared Job has run successfully, the four items of metadata are exported to your own grid variable.
Using instance credentials on AWS
When hosted on AWS, give the EC2 instance the permission to use S3. Refer to the Prerequisites section for more information. This Shared Job will inherit the privileges. Set the Access Key and Secret Access Key parameters to a single dash, or leave them blank.
Usage on Azure or GCP
When hosted on Azure or GCP, permissions can not be inherited. You must supply an Access Key and a Secret Access Key.
Prerequisites
This shared job requires Python 3.8.
To avoid a ModuleNotFoundError, the following Python libraries must be available:
- boto3
When running on AWS, permissions can be inherited from the Matillion ETL instance. This is preferred. Set the Access Key and Secret Access Key parameters to a single dash, or leave them blank in this case. Ensure that the EC2 instance credentials attached to your Matillion ETL instance include the privilege to read from S3. For more information, refer to the “IAM in AWS” section in this article on RBAC in the Cloud.
When running on other platforms, authentication is done using AWS access keys for programmatic access to AWS. Supply both the Access Key and the Secret Access Key parameters.
Downloads
Licensed under: Matillion Free Subscription License
- Download read-s3-metadata.melt
- Target: Any target cloud data platform
- Version: 1.68.3 or higher