PII Catcher
Overview
PiiCatcher finds PII data in your databases anbd files. It finds the following types of PII information:
- PHONE
- CREDIT_CARD
- ADDRESS
- PERSON
- LOCATION
- BIRTH_DATE
- GENDER
- NATIONALITY
- IP_ADDRESS
- SSN
- USER_NAME
- PASSWORD
PiiCatcher uses three types of scanners to detect PII information:
- CommonRegex uses a set of regular expressions for common types of information
- Spacy Named Entity Recognition uses Natural Language Processing to detect named entities. Only English language is currently supported.
- Column Name Scanner scan the name of the column for common names given to columns containing PII data.
Supported Technologies
PiiCatcher supports the following filesystems:
- POSIX
- AWS S3 (for files that are part of tables in AWS Glue and AWS Athena)
- Google Cloud Storage (Coming Soon)
- ADLS (Coming Soon)
PiiCatcher supports the following databases:
- Sqlite3 v3.24.0 or greater
- MySQL 5.6 or greater
- PostgreSQL 9.4 or greater
- AWS Redshift
- SQL Server
- Oracle
- AWS Glue/AWS Athena
Example
# run piicatcher on a sqlite db and print report to console
piicatcher db -c '/db/sqlqb'
โญโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฎ
โ schema โ table โ column โ has_pii โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโค
โ main โ full_pii โ a โ 1 โ
โ main โ full_pii โ b โ 1 โ
โ main โ no_pii โ a โ 0 โ
โ main โ no_pii โ b โ 0 โ
โ main โ partial_pii โ a โ 1 โ
โ main โ partial_pii โ b โ 0 โ
โฐโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโฏ