{"objects":[{"metadata":{"identifier":{"packageName":"Matillion.Data Loader.CDC","name":"Create or Refresh External Table (Exchange)","revision":2,"type":"DYNAMIC"},"rootJobReference":{"name":"Create or Refresh External Table - 1 - Process","type":"ORCHESTRATION","parameterMetadata":[{"slot":2,"variableName":"target_database","variableType":"SCALAR","displayName":"Target Database","description":"The Snowflake database containing the external table with change events and the target table.","defaultValue":[{"values":{"1":"[Environment Default]"}}],"defaultValueType":"TEXT","required":true},{"slot":3,"variableName":"target_schema","variableType":"SCALAR","displayName":"Target Schema","description":"The Snowflake schema containing the external table with change events and the target table.","defaultValue":[{"values":{"1":"[Environment Default]"}}],"defaultValueType":"TEXT","required":true},{"slot":4,"variableName":"snowflake_stage","variableType":"SCALAR","displayName":"External Snowflake Stage","description":"The name of the existing external stage which contains the files output by the CDC pipeline.","defaultValue":[{"values":{"1":""}}],"defaultValueType":"TEXT","required":true},{"slot":5,"variableName":"external_table","variableType":"SCALAR","displayName":"External Table","description":"The external table in Snowflake containing the CDC change events.","defaultValue":[{"values":{"1":""}}],"defaultValueType":"TEXT","required":true},{"slot":6,"variableName":"cloud_storage_url","variableType":"SCALAR","displayName":"Cloud Storage URL","description":"The URL to the location where the CDC change event files are stored. This should have the format:\n\ns3://
Please see the full shared job documentation at the link - https://documentation.matillion.com/docs/sharedjobs-cdc
Property | Type | Description |
---|---|---|
Target Database | Variable | The Snowflake database containing the external table with change events and the target table. |
Target Schema | Variable | The Snowflake schema containing the external table with change events and the target table. |
External Snowflake Stage | Variable | The name of the existing external stage which contains the files output by the CDC pipeline. |
External Table | Variable | The external table in Snowflake containing the CDC change events. |
Cloud Storage URL | Variable | The URL to the location where the CDC change event files are stored. This should have the format:\n\ns3://<bucket>/<prefix>/\nor\nazure://<account>.blob.core.windows.net/<container>/<prefix>/\nor\ngs://<bucket>/<prefix>/ |
Please see the full shared job documentation at the link - https://documentation.matillion.com/docs/sharedjobs-cdc
Property | Type | Description |
---|---|---|
Cloud Storage URL | Variable | The URL of the location where the CDC change event files are stored. This should have the format:\n\ns3:// |
Warehouse | Variable | The Snowflake warehouse used to process the data. |
Target Database | Variable | The Snowflake database containing the external table with change events and the target table. |
Target Schema | Variable | The Snowflake schema containing the external table with change events and the target table. |
Snowflake External Stage | Variable | The name of the existing external stage which contains the files output by the CDC pipeline. |
External Table | Variable | The external table in Snowflake containing the CDC change events. |
Concurrency | Variable | How to iterate over the source tables for which there are CDC event data:\n\nSequential: Iterations are done in sequence, waiting for each to complete before starting the next. This is the default.\nConcurrent: Iterations are run concurrently. |
Target Table Prefix | Variable | A prefix to prepend to the source table name to generate the target table name. If no Target Table Prefix is specified, the target table will have the same name as the source table. |
Fully Qualify Target Table (Recommended) | Variable | Optionally includes the source database and schema in the target table name.\n\nY: The target table name will have the format \n |
Transformation Type | Variable | The type of transformation used when applying the change events to the target table.\n\nCopy Table: The target table will be maintained as a copy of the source table\nCopy Table with Soft Deletes: Same as Copy Table, but records deleted in the source table will be retained in the target table.\nChange Log: All change events will be extracted and appended to the target table\n\nNote: A primary key is required on the source table for Copy Table and Copy Table with Soft Deletes transformations. The primary key is used by the shared job to merge updates into the target table. If the source table does not have a primary key, the transformation type will be updated to Change Log.\n\nWarning: If you intend to change the type of transformation for an already existing / running workflow, you must also drop the target table(s) before changing the transformation type. The transformation types are mutually exclusive and cannot be applied to the same target table(s). |
Append All Metadata | Variable | Append all metadata columns to the target table. Options are:\n\nY: Append all metadata columns\nN: Only append metadata columns required by this job |
Azure Key Vault URL | Variable | Required for tagging files in Azure Blob Storage\nThe URL of the Azure Key Vault used to store the Azure Blob Storage connection string. This is also called the vault's \"DNS Name\". |
Azure Blob Storage Account Key Secret | Variable | Required for tagging files in Azure Blob Storage\nName of the Azure Key Vault secret which stores the Azure Blob Storage account key. |
Schema Drift Action | Variable | The action taken when schema changes are detected in the source table, which require an update to the target table:\n\nUpdate Target: The target table is updated according to the new schema.\nFail Job: This job will fail, causing no changes to the target table. |
Logging Enabled | Variable | Enable logging of row counts from transformation jobs. Options are:\n\nY: Log row counts\nN: Do not log row counts\n\nNote: Logged row counts may not match the number of rows processed by the CDC agent. For the Copy Table and Copy Table with Soft Deletes transformation types, if a given record has more than one change event in a job execution, only the final change event will be processed. The \"after\" object of the final change event contains the net changes from all the previous events. |
Log Table | Variable | The name of the table in Snowflake containing the logged events. |
Trim All String Columns | Variable | Remove trailing whitespace from string columns. Options are:\n\nY: Remove trailing whitespace\nN: Do not remove trailing whitespace\n |
Tag Processed Files | Variable | Once the Avro files have been processed, update the tags or metadata in cloud storage. This enables a lifecycle management policy to be applied to files to reduce storage costs. Options are:\n\nY: Update tags / metadata\nN: Do not update tags / metadata\n\nS3 and Azure Blob Storage\nA tag of matillion_cdc_processed = true will be added to each file. If a file already has the maximum 10 tags, the tagging process will be skipped.\n\nNote: For Azure Blob Storage, Blob Index Tags are not supported on Storage Accounts with a hierarchical namespace enabled.\n\nGoogle Cloud Storage\nThe 'Custom-Time' metadata will be updated to be the time the file was processed. If the 'Custom-Time' has already been set to a value in the future, the metadata update will be skipped. |
Bytes to Decimal Function | Variable | For some source databases (e.g. PostgreSQL and Oracle), numeric columns can be defined without specifying a scale or precision, and these columns are able to contain values with different scales and precisions. When these columns are extracted to the Avro files, they are denoted as the type VariableScaleDecimal.\n\nA VariableScaleDecimal value is a variant object with two properties, the value stored as a byte array, and a numeric scale. Snowflake SQL cannot natively convert these byte arrays back to the original numeric format. If your source tables contain such columns, a User Defined Function is required by the shared job to convert the values.\n\nIf no value is specified for this parameter, the shared job will not attempt to convert the byte arrays, and will load them to columns of type variant.\n\nIf a value is specified, the shared job will check whether the user defined function already exists. If it does not exist, the job will attempt to create it. If the job cannot create the function due to insufficient privileges, the job will fail and the function must be created in the Snowflake Console.\n\nIf the parameter is set to a value like BYTES_TO_DECIMAL, the job will look for the function in the environment default database and schema.\n\nIf the parameter is set to a value like CDC_DB.PUBLIC.BYTES_TO_DECIMAL, the job will look for the function in the PUBLIC schema of the CDC_DB database. |
Use Source Schemas | Variable | Optionally create the target tables in a schema with the same name as the source table.\nIf the schema doesn't already exist, the job will try to create it.\n\nY: Create the target tables in a schema with the same name as the source.\nN: Create the target tables in the schema specified by the Target Schema parameter.\n\nNote: Which ever option is selected, the external tables will still be created in the schema specified by the Target Schema parameter. |
Primary Key Override | Grid | Optionally provide a list of primary key columns for the source tables.\n\nBy default, the job will read the primary key columns from the CDC Avro files. However, if the source table does not a have a primary defined in its DDL, a list of unique columns can be specified here to enable Copy Table transformations.\n\nNote: The values for the source_database, source_schema, source_table, and source_column are case sensitive, and must match the source database. |
Please see the full shared job documentation at the link - https://documentation.matillion.com/docs/sharedjobs-cdc
Property | Type | Description |
---|---|---|
Data Source | Variable | The CDC pipeline data source. Must be one of the supported CDC pipeline sources in Matillion Data Loader:\n\n- DB2_IBM_i\n- MySQL\n- Oracle\n- PosgtreSQL\n- SQLServer |
Source Database | Variable | The name of the database containing the source table. |
Source Schema | Variable | The name of the schema containing the source table. |
Source Table | Variable | The name of the source table. |
Cloud Storage URL | Variable | The URL of the location where the CDC change event files are stored. This should have the format:\n\ns3:// |
Warehouse | Variable | The Snowflake warehouse used to process the data. |
Target Database | Variable | The Snowflake database containing the external table with change events and the target table. |
Target Schema | Variable | The Snowflake schema containing the external table with change events and the target table. |
External Table | Variable | The external table in Snowflake containing the CDC change events. |
Target Table Prefix | Variable | A prefix to prepend to the source table name to generate the target table name. If no Target Table Prefix is specified, the target table will have the same name as the source table. |
Fully Qualify Target Table (Recommended) | Variable | Optionally includes the source database and schema in the target table name.\n\nY: The target table name will have the format \n |
Transformation Type | Variable | The type of transformation used when applying the change events to the target table.\n\nCopy Table: The target table will be maintained as a copy of the source table\nCopy Table with Soft Deletes: Same as Copy Table, but records deleted in the source table will be retained in the target table.\nChange Log: All change events will be extracted and appended to the target table\n\nNote: A primary key is required on the source table for Copy Table and Copy Table with Soft Deletes transformations. The primary key is used by the shared job to merge updates into the target table. If the source table does not have a primary key, the transformation type will be updated to Change Log.\n\nWarning: If you intend to change the type of transformation for an already existing / running workflow, you must also drop the target table(s) before changing the transformation type. The transformation types are mutually exclusive and cannot be applied to the same target table(s). |
Append All Metadata | Variable | Append all metadata columns to the target table. Options are:\n\nY: Append all metadata columns\nN: Only append metadata columns required by this job |
Azure Key Vault URL | Variable | Required for tagging files in Azure Blob Storage\nThe URL of the Azure Key Vault used to store the Azure Blob Storage connection string. This is also called the vault's \"DNS Name\". |
Azure Blob Storage Account Key Secret | Variable | Required for tagging files in Azure Blob Storage\nName of the Azure Key Vault secret which stores the Azure Blob Storage account key. |
Schema Drift Action | Variable | The action taken when schema changes are detected in the source table, which require an update to the target table:\n\nUpdate Target: The target table is updated according to the new schema.\nFail Job: This job will fail, causing no changes to the target table. |
Logging Enabled | Variable | Enable logging of row counts from transformation jobs. Options are:\n\nY: Log row counts\nN: Do not log row counts\n\nNote: Logged row counts may not match the number of rows processed by the CDC agent. For the Copy Table and Copy Table with Soft Deletes transformation types, if a given record has more than one change event in a job execution, only the final change event will be processed. The \"after\" object of the final change event contains the net changes from all the previous events. |
Log Table | Variable | The name of the table in Snowflake containing the logged events. |
Trim All String Columns | Variable | Remove trailing whitespace from string columns. Options are:\n\nY: Remove trailing whitespace\nN: Do not remove trailing whitespace |
Tag Processed Files | Variable | Once the Avro files have been processed, update the tags or metadata in cloud storage. This enables a lifecycle management policy to be applied to files to reduce storage costs. Options are:\n\nY: Update tags / metadata\nN: Do not update tags / metadata\n\nS3 and Azure Blob Storage\nA tag of matillion_cdc_processed = true will be added to each file. If a file already has the maximum 10 tags, the tagging process will be skipped.\n\nNote: For Azure Blob Storage, Blob Index Tags are not supported on Storage Accounts with a hierarchical namespace enabled.\n\nGoogle Cloud Storage\nThe 'Custom-Time' metadata will be updated to be the time the file was processed. If the 'Custom-Time' has already been set to a value in the future, the metadata update will be skipped.\n |
Bytes to Decimal Function | Variable | For some source databases (e.g. PostgreSQL and Oracle), numeric columns can be defined without specifying a scale or precision, and these columns are able to contain values with different scales and precisions. When these columns are extracted to the Avro files, they are denoted as the type VariableScaleDecimal.\n\nA VariableScaleDecimal value is a variant object with two properties, the value stored as a byte array, and a numeric scale. Snowflake SQL cannot natively convert these byte arrays back to the original numeric format. If your source tables contain such columns, a User Defined Function is required by the shared job to convert the values.\n\nIf no value is specified for this parameter, the shared job will not attempt to convert the byte arrays, and will load them to columns of type variant.\n\nIf a value is specified, the shared job will check whether the user defined function already exists. If it does not exist, the job will attempt to create it. If the job cannot create the function due to insufficient privileges, the job will fail and the function must be created in the Snowflake Console.\n\nIf the parameter is set to a value like BYTES_TO_DECIMAL, the job will look for the function in the environment default database and schema.\n\nIf the parameter is set to a value like CDC_DB.PUBLIC.BYTES_TO_DECIMAL, the job will look for the function in the PUBLIC schema of the CDC_DB database. |
Use Source Schemas | Variable | Optionally create the target tables in a schema with the same name as the source table.\nIf the schema doesn't already exist, the job will try to create it.\n\nY: Create the target tables in a schema with the same name as the source.\nN: Create the target tables in the schema specified by the Target Schema parameter.\n\nNote: Which ever option is selected, the external tables will still be created in the schema specified by the Target Schema parameter. |
Primary Key Override | Grid | Optionally provide a list of primary key columns for the source tables.\n\nBy default, the job will read the primary key columns from the CDC Avro files. However, if the source table does not a have a primary defined in its DDL, a list of unique columns can be specified here to enable Copy Table transformations.\n\nNote: The values for the source_database, source_schema, source_table, and source_column are case sensitive, and must match the source database. |
Drop all the tables created by the Sync All Tables and Sync Single Tables shared jobs.
Property | Type | Description |
---|---|---|
Cloud Storage URL | Variable | The URL of the location where the CDC change event files are stored. This should have the format:\n\ns3://<bucket>/<prefix>/\nor\nazure://<account>.blob.core.windows.net/<container>/<prefix>/\nor\ngs://<bucket>/<prefix>/ |
Target Database | Variable | The Snowflake database containing the external table with change events and the target table. |
Target Schema | Variable | The Snowflake schema containing the external table with change events and the target table. |
External Table | Variable | The external table in Snowflake containing the CDC change events. |
Target Table Prefix | Variable | A prefix to prepend to the source table name to generate the target table name. If no Target Table Prefix is specified, the target table will have the same name as the source table. |
Fully Qualify Target Table | Variable | Optionally includes the source database and schema in the target table name.\n\nY: The target table name will have the format \n<source_database>_<source_schema>_<source_table>\n\nN: The target table name will have the format \n<source_table>\n\nIt is strongly recommended to leave this set to 'Y', unless you are confident that your source table names will always be unique.\n\nIf the target table name is not fully qualified, and there are source tables with the same name in different databases or schemas, this job will see them as the same table and attempt to merge their metadata and contents. |
Azure Key Vault URL | Variable | Required for Azure Blob Storage\nThe URL of the Azure Key Vault used to store the Azure Blob Storage connection string. This is also called the vault's "DNS Name". |
Azure Blob Storage Account Key Secret | Variable | Required for Azure Blob Storage\nName of the Azure Key Vault secret which stores the Azure Blob Storage account key. |
Actually drop the tables | Variable | Optionally drop the CDC tables from the target database.\n\nY: The CDC tables will be dropped.\n\nN: The CDC tables will not be dropped. A list of the tables can be exported from the cdc_tables grid variable. |
CDC Tables (for exporting only) | Grid | Do not set this parameter, it will have no effect. A cdc_tables grid variable can be exported, containing the full list of tables created by Sync All Tables shared job. |
Property | Type | Description |
---|---|---|
Cloud Storage URL | Variable | The URL of the location where the CDC change event files are stored. This should have the format:\n\ns3:// |
Azure Key Vault URL | Variable | Required for Azure Blob Storage\nThe URL of the Azure Key Vault used to store the Azure Blob Storage connection string. This is also called the vault's \"DNS Name\". |
Azure Blob Storage Account Key Secret | Variable | Required for Azure Blob Storage\nName of the Azure Key Vault secret which stores the Azure Blob Storage account key. |
Tables to Process | Grid | Optional.\n\nIf left blank, the job will tag the Avro and schema.avsc files for every table in the pipeline.\n\nIf you would like the job to only process files for a subset of tables in the pipeline, enter the names of the tables into this grid variable. |