DataOps

Author: Matillion
Date Posted: Mar 22, 2025
Last Modified: Mar 22, 2025

Github Action Publish Artifact

Publish Data Productivity Cloud pipelines as an Artifact with this Github Action

The attached main.yml file contains a Github Action which uses the Data Productivity Cloud Public API to upload all pipeline files in a project and publish them as an artifact to a given environment.

Setup in GitHub

Add the YAML workflow file main.yml to .github/workflows/ in the root of your repository.

Add the following secrets to the repository under Settings > Secrets and variables > Actions:

  • MATILLION_PUBLIC_API_CLIENT_ID: Client ID for Matillion API access.
  • MATILLION_PUBLIC_API_CLIENT_SECRET: Client Secret for Matillion API access.

Add the following repository variables under Settings > Secrets and variables > Actions:

  • MATILLION_PROJECT_ID: The ID of the Matillion project you are working with.

Pipeline Setup

The workflow includes several hardcoded environment variables that need to be configured appropriately for your team and project. Below is an explanation of each variable, its purpose, and how to configure it:

Versioning

  • VERSION_PREFIX: v
  • Purpose: Sets the prefix for semantic versioning (e.g., v1.0.0)
  • Default: v
  • Customization: Update this if your versioning scheme requires a different prefix (e.g., release- for release-1.0.0).

Authentication

CLIENT_ID: ${{ secrets.MATILLION_PUBLIC_API_CLIENT_ID }}

  • Purpose: Matillion API Client ID for authentication.
  • Source: Stored securely in GitHub Secrets.
  • Setup:
    • Go to Settings > Secrets and variables > Actions.
    • Add MATILLION_PUBLIC_API_CLIENT_ID with the Client ID from your Matillion account.

CLIENT_SECRET: ${{ secrets.MATILLION_PUBLIC_API_CLIENT_SECRET }}

  • Purpose: Matillion API Client Secret for authentication.
  • Source: Stored securely in GitHub Secrets.
  • Setup:
    • Go to Settings > Secrets and variables > Actions.
    • Add MATILLION_PUBLIC_API_CLIENT_SECRET with the Client Secret from your Matillion account.

API Endpoints

TOKEN_URL: https://id.core.matillion.com/oauth/dpc/token

  • Purpose: Matillion OAuth2 token endpoint.
  • Default: This URL is specific to Matillion’s authentication system and should not need modification.

PROJECTS_URL: https://eu1.api.matillion.com/dpc/v1/projects

  • Purpose: Base URL for Matillion API requests.
  • Default: https://eu1.api.matillion.com/dpc/v1/projects
  • Customization:
    • If your Matillion account is in a different region, update this URL (e.g., https://us1.api.matillion.com/dpc/v1/projects for US regions).

Project and Environment

PROJECT_ID: ${{ vars.MATILLION_PROJECT_ID }}

  • Purpose: The ID of the Matillion project being managed by this workflow.
  • Source: Stored in GitHub Variables.
  • Setup:
    • Go to Settings > Secrets and variables > Actions > Repository Variables.
    • Add MATILLION_PROJECT_ID with your project’s ID from Matillion.

ENVIRONMENT_NAME: DataOps Whitepaper-production

  • Purpose: Human-readable name of the environment where pipelines are deployed.
  • Customization:
    • Replace DataOps Whitepaper-production with your target environment’s name.

ENVIRONMENT_NAME_URL_ENCODED: DataOps%20Whitepaper-production

  • Purpose: URL-encoded version of the ENVIRONMENT_NAME for API calls.
  • Customization:
    • Replace DataOps%20Whitepaper-production with the URL-encoded version of your environment name.
    • Use an online tool to encode spaces and special characters (e.g., My EnvironmentMy%20Environment).

Test Pipelines

TEST_PIPELINE_NAMES: dataops-orchestration-pipeline,test-pipeline-2

  • Purpose: List of pipelines to verify and execute during the testing phase of the pipeline.
  • Customization:
    • Provide a comma-separated list of pipeline names (e.g., pipeline1,pipeline2).

Execution Timing

CHECK_INTERVAL: 15

  • Purpose: Time (in seconds) to wait before checking pipeline statuses
  • Default: 15
  • Customization:
    • Adjust this based on the typical runtime of your pipelines to balance efficiency and API rate limits.

Note: It currently does not poll, it only checks once, so this is effectively a timeout for the Test pipeline step.

Overview of the GitHub Action

This GitHub Action automates the process of validating, deploying, and testing Data Productivity Cloud pipelines on the main branch.

It runs whenever a commit is pushed or merged to the main branch (and can also be triggered manually).

It consists of the following jobs:

Validate Pipelines

  • Checks out the repository.
  • Validates YAML files using yamllint.

Execute Pipelines

Runs only after the validate-pipelines job succeeds.

Performs several tasks:

  • Calculate Next Version: Determines the next semantic version by incrementing the patch version.
  • Generate Access Token: Authenticates with Matillion and generates a token for subsequent API calls.
  • Upload Artifacts and Publish to Default Environment: Uploads .orch.yaml and .tran.yaml files to Matillion as part of the artifact deployment.
  • Tag Main Branch: Tags the main branch with the new version using the calculated semantic version.
  • Verify Test Pipelines: Ensures the specified test pipelines are published in the environment.
  • Execute Test Pipelines: Executes the pipelines and retrieves execution IDs.
  • Wait for Pipeline Completion: Monitors the status of executed pipelines, ensuring all complete successfully.
  • All Pipelines Completed Successfully: Outputs success if all pipelines execute as expected.

Step-by-Step Explanation of Each Job

Validate Pipelines

  • Purpose: Ensures the syntax of YAML files in the repository is valid.
  • Command: Uses yamllint to validate all YAML files.

Calculate Next Version

  • Purpose: Calculates the next semantic version by inspecting existing tags and incrementing the patch version.
  • Logic:
    • Fetches all tags.
    • Identifies the latest tag (e.g., v1.0.0).
    • Increments the patch version to e.g. v1.0.1.

Generate Access Token

  • Purpose: Authenticates with Matillion and generates a bearer token for API calls.

Upload Artifacts and Publish to Default Environment

  • Purpose: Uploads files (.orch.yaml, .tran.yaml, .py and .sql) to Matillion.
  • Implementation:
    • Finds all relevant files in the repository.
    • Constructs a single POST request with a multipart payload containing all files.
    • Includes headers for versionName, environmentName, and commitHash.

Tag Main Branch

  • Purpose: Tags the main branch with the calculated version.
  • Implementation:
    • Uses the GitHub API via actions/github-script to create the tag.

Verify Test Pipelines

  • Purpose: Ensures the specified test pipelines are published in the environment.
  • API: Calls the Matillion published-pipelines API and checks if all expected pipelines exist.

Execute Test Pipelines

  • Purpose: Executes the specified pipelines in the given environment.
  • API: Calls the Matillion pipeline-executions API to initiate execution for each pipeline.
  • Outputs: Collects pipelineExecutionId for each executed pipeline.

Wait for Pipeline Completion

  • Purpose: Monitors the status of executed pipelines.
  • API: Waits the defined amount and then calls the Matillion pipeline-executions status API to check execution status.

All Pipelines Completed Successfully

  • Purpose: Confirms all pipelines executed successfully.
  • Output: Displays a success message if all pipelines succeed.

How to Run

Trigger Automatically:

  • Push or merge changes to the main branch to automatically trigger the workflow.

Trigger Manually:

  • Go to the Actions tab in your repository.
  • Select the workflow and click Run workflow.

Downloads

Licensed under: Matillion Free Subscription License