Data Observability Articles for Data Lakes

What is Data Observability ?

Data Observability is a set of processes to understand the health of data at an organization. Some factors that determine the health of the data are:

  • Freshness: Tracks if data pipelines read the latest and most up-to-date version of the data.
  • Field Statistics such as ratio of null values, min/max etc. should be in line with historical trends.
  • Volume: No. of rows or data size in a dataset should be in line with historical trends.
  • Lineage: Data Lineage enables providing context when triaging data health issues.

1 post tagged with "Data Observability" (See all categories)