Tools for cleaning files of various string inputs and the copying them to a new target directory.
Scan files in a sources bucket path then for a given match/replace combination fix the files and place new files in a new target directory.
Variables
Database where you store which files you have processed and their status
Tracking schema and tablename where the file statuses will be stored
AWS Credentials with access to the sources and target s3 locations.
Python list of replacements to scan for for each line encountered within the file.
Type of compression the files are originating in
S3 Bucket where you want to unload the data to.
Path where original teak files are located.
Path where converted files will be transfered to.
S3 Bucket where you want to unload the data to.
Target file format
Source file format
If True reprocess files even if you have already successfully processed those files before.
Pattern matched path for the sources files: example: /yr={YYYY}/m={MM}/d={DD} depends_on
Pattern matched path for the target files: example: /yr={YYYY}/m={MM}/d={DD}. **If empty uses sources pattern
Number of days from current date to lookback for new files. Paths based on date pattern provided in Source Date Path Pattern
If true the process will use date replacements for sources and target locations. Paths will be appended to the sources target paths specified above.