PII Catcher

Overview

PiiCatcher finds PII data in your databases anbd files. It finds the following types of PII information:

  • PHONE
  • EMAIL
  • CREDIT_CARD
  • ADDRESS
  • PERSON
  • LOCATION
  • BIRTH_DATE
  • GENDER
  • NATIONALITY
  • IP_ADDRESS
  • SSN
  • USER_NAME
  • PASSWORD

PiiCatcher uses three types of scanners to detect PII information:

  1. CommonRegex uses a set of regular expressions for common types of information
  2. Spacy Named Entity Recognition uses Natural Language Processing to detect named entities. Only English language is currently supported.
  3. Column Name Scanner scan the name of the column for common names given to columns containing PII data.

Supported Technologies

PiiCatcher supports the following filesystems:

  • POSIX
  • AWS S3 (for files that are part of tables in AWS Glue and AWS Athena)
  • Google Cloud Storage (Coming Soon)
  • ADLS (Coming Soon)

PiiCatcher supports the following databases:

  1. Sqlite3 v3.24.0 or greater
  2. MySQL 5.6 or greater
  3. PostgreSQL 9.4 or greater
  4. AWS Redshift
  5. SQL Server
  6. Oracle
  7. AWS Glue/AWS Athena

Example

# run piicatcher on a sqlite db and print report to console
piicatcher db -c '/db/sqlqb'
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ schema โ”‚ table โ”‚ column โ”‚ has_pii โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ main โ”‚ full_pii โ”‚ a โ”‚ 1 โ”‚
โ”‚ main โ”‚ full_pii โ”‚ b โ”‚ 1 โ”‚
โ”‚ main โ”‚ no_pii โ”‚ a โ”‚ 0 โ”‚
โ”‚ main โ”‚ no_pii โ”‚ b โ”‚ 0 โ”‚
โ”‚ main โ”‚ partial_pii โ”‚ a โ”‚ 1 โ”‚
โ”‚ main โ”‚ partial_pii โ”‚ b โ”‚ 0 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ