You can't use this until you register with us!




Description

Tools for cleaning files of various string inputs and the copying them to a new target directory.



Available Actions

Clean and Copy

Scan files in a sources bucket path then for a given match/replace combination fix the files and place new files in a new target directory.


Variables

Connection Information
tracking_db required

Database where you store which files you have processed and their status

tracking_table required

Tracking schema and tablename where the file statuses will be stored

source_aws_connection

AWS Credentials with access to the sources and target s3 locations.


Replacement Configurations
replacements required

Python list of replacements to scan for for each line encountered within the file.


Source Info
source_compression required

Type of compression the files are originating in

source_bucket

S3 Bucket where you want to unload the data to.

source_s3_path required

Path where original teak files are located.


Target Info
target_s3_path required

Path where converted files will be transfered to.

target_bucket

S3 Bucket where you want to unload the data to.


Compression Info
target_compression required

Target file format

source_format required

Source file format


Runtime Arguments
reprocess required

If True reprocess files even if you have already successfully processed those files before.


Date Enabled Arguments
date_source_pattern

Pattern matched path for the sources files: example: /yr={YYYY}/m={MM}/d={DD} depends_on

date_target_pattern

Pattern matched path for the target files: example: /yr={YYYY}/m={MM}/d={DD}. **If empty uses sources pattern

date_lookback required

Number of days from current date to lookback for new files. Paths based on date pattern provided in Source Date Path Pattern

use_date_path_patterns required

If true the process will use date replacements for sources and target locations. Paths will be appended to the sources target paths specified above.