Data Lineage

Overview

Data Lineage is an open source application to query and visualize data lineage in databases, data warehouses and data lakes in AWS and GCP.

Features

  • Generate lineage from SQL query history.
  • Supports ANSI SQL queries
  • Integrate with Jupyter Notebook
  • Visualize data lineage using Plotly.
  • Select source or target table.
  • Pan, Zoom, Select graph

Checkout an example data lineage notebook.

Use Cases

Data Lineage enables the following use cases:

  • Business Rules Verification
  • Change Impact Analysis
  • Data Quality Verification

Check out the post on using data lineage for cost control for an example of how data lineage can be used in production.

Supported Databases

  • PostgreSQL

Coming Soon

  • AWS Athena
  • MySQL/MariaDb
  • AWS Redshift