Skip to main content

API

Usage

from piicatcher import scan_file_object, scan_database

pii_types = scan_file_object(...)
catalog = scan_database(...)

scan_file_object

scan_file_object(fd: <class 'TextIO'>) -> List[Any]
Args:
fd (file descriptor): A file descriptor open in text mode.
Returns: A list of PIITypes enum of all the PII types found in the file.

scan_database

scan_database(
connection: Any,
connection_type: str,
scan_type: str = 'shallow',
include_schema: Tuple = (),
exclude_schema: Tuple = (),
include_table: Tuple = (),
exclude_table: Tuple = ()) -> Dict[Any, Any]
Args:
connection (connection): Connection object to a database
connection_type (str): Database type. Can be one of sqlite, snowflake,
athena, redshift, postgres, mysql or oracle
scan_type (str): Choose deep(scan data) or shallow(scan column names only)
include_schema (List[str]):
Scan only schemas matching any pattern; When this option is not specified,
all non-system schemas in the target database will be scanned. Also, the
pattern is interpreted as a regular expression, so multiple schemas can
also be selected by writing wildcard characters in the pattern.
exclude_schema (List[str]):
List of patterns. Do not scan any schemas matching any pattern. The
pattern is interpreted according to the same rules as include_schema.
When both include_schema and exclude_schema are given, the behavior is
to dump just the schemas that match at least one include_schema pattern
but no exclude_schema patterns. If only exclude_schema is specified, then
matching schemas matching are excluded.
include_table (List[str]):
List of patterns to match table. Similar in behaviour to include_schema.
exclude_table (List[str]):
List of patterns to exclude matching table.

Returns:
dict: A dictionary of schemata, tables and columns