Data Lineage

Overview

Data Lineage is an open source application to query and visualize data lineage in databases, data warehouses and data lakes in AWS and GCP.

Data Lineage Web Application

We are building a data lineage web application for Snowflake, Redshift and BigQuery. Interested? Take this survey to sign up.

Features

  • Lineage & Catalog stored in a database.
  • API and SDK to integrate with any ETL framework.
  • Use the query parser module to generate lineage from SQL query history.
  • Supports ANSI SQL queries
  • Analyze Data Lineage graphs with Jupyter Notebook
  • Browse Data Lineage of datasets using a web browser in server mode
  • Visualize data lineage using Plotly.
  • Select source or target table.
  • Pan, Zoom, Select graph

Checkout the following example notebooks:

Use Cases

data-lineage enables the following use cases:

  • Business Rules Verification
  • Change Impact Analysis
  • Data Quality Verification

Check out the post on using data lineage for cost control for an example of how data lineage can be used in production.

Modes

data-lineage can be run in two modes:

  • Jupyter Notebook: In this mode, you can visualize and analyze data lineage using graph operations.
  • Server: In this mode, your team can browse the lineage of all data sets using a browser.

Supported Databases

  • PostgreSQL
  • AWS Redshift
  • Snowflake

Coming Soon

  • AWS Athena
  • MySQL/MariaDb