Scan Column Names using regular expressions

Data engineers and schema designers use common names for common data categories such as name, address, SSN (Social Security Numbers). These protocols are rooted in the culture of the region.

Tokern uses common regular expressions on column names to detect sensitive data.

System provided regular expressions (in Python syntax) are:

  • PERSON:
^.*(firstname|fname|lastname|lname|
fullname|fname|maidenname|_name|
nickname|name_suffix|name).*$
  • Email
^.*(email|e-mail|mail).*$
  • BIRTH_DATE
^.*(date_of_birth|dateofbirth|dob|birthday|date_of_death|dateofdeath).*$
  • GENDER
^.*(gender).*$
  • NATIONALITY
^.*(nationality).*$
  • ADDRESS
^.*(address|city|state|county|country|zipcode|postal).*$
  • USER NAME
^.*user(id|name|).*$
  • PASSWORD:
^.*pass.*$
  • US Social Security Number
^.*(ssn|social).*$