TTwo Methods to Scan for PII in Data Warehouses
An important requirement for data privacy and protection is to find and catalog tables and columns that contain PII or PHI data in a data warehouse. Open source data catalogs like Datahub and…
What is a Data Catalog ?
Data Catalog is the foundation of many capabilities such as data discovery, governance, and security. At its simplest, the data catalog manages metadata about all the data sets in your company.
A data catalog is important to solve two problems in modern data teams:
An important requirement for data privacy and protection is to find and catalog tables and columns that contain PII or PHI data in a data warehouse. Open source data catalogs like Datahub and…
Parsing SQL queries provides superpowers for monitoring data health. This post describes how to get started on parsing SQL for data observability. Query history of a data warehouse is a rich source of…
Metadata in a data lake is important for the productivity of everyone in the data ecosystem. The different types of metadata, systems to store them, and their consumers can be very confusing. How is a…
Why do you need a data catalog ? A data catalog is important to solve two problems in modern data teams: Avoid poor productivity of people and the ROI of data. Governance Risk Productivity Analysts…
AWS Lake Formation permissions control access to data sets in your data lake in AWS at a table and column level granularity. For a quick primer, read Lake Permissions by Example blog post. Once…