Scan AWS S3 using Athena and Glue

Command Options

OptionDefaultDescription
access-keyNoneAWS Access Key [required]
secret-keyNoneAWS Secret Key [required]
staging-dirNoneS3 Staging Directory for Athena results
regionNoneAWS Region [required]
scan-typeshallowOne of deep, shallow. Deep scan checks sample data. Shallow scan checks column names using regular expression
list-allFalseList all columns. By default only columns with PII information is listed
schemaNoneScan only schemas matching the pattern. Refer to Include/Exclude Lists.
exclude-schemaNoneDo not scan any schemas matching the pattern. Refer to Include/Exclude Lists.
tableNoneScan only tables matching the pattern. Refer to Include/Exclude Lists.
exclude-tableNoneDo not scan any tables matching the pattern. Refer to Include/Exclude Lists.

Command Line

piicatcher aws --help
Usage: piicatcher aws [OPTIONS]
Options:
-a, --access-key TEXT AWS Access Key [required]
-s, --secret-key TEXT AWS Secret Key [required]
-d, --staging-dir TEXT S3 Staging Directory for Athena results
[required]
-r, --region TEXT AWS Region [required]
-c, --scan-type [deep|shallow] Choose deep(scan data) or shallow(scan
column names only)
--list-all List all columns. By default only columns
with PII information is listed
-n, --schema TEXT Scan only schemas matching schema.
-N, --exclude-schema TEXT Do not scan any schemas matching the schema
pattern.
-t, --table TEXT Dump only tables matching table.
-T, --exclude-table TEXT Do not dump any tables matching the table
pattern.
--help Show this message and exit.

Configuration File

[aws]
access_key="..."
secret_key="..."
staging_dir="..."
region="..."
scan_type="[deep|shallow]"
list_all=True|False
schema=("<schema>",["<schema2>", ...])
exclude_schema=("<schema>",["<schema2>", ...])
table=("<schema>",["<schema2>", ...])
exclude_table=("<schema>",["<schema2>", ...])