Data Lineage
Overview
Data Lineage is an open source application to query and visualize data lineage in databases, data warehouses and data lakes in AWS and GCP.
Data Lineage Web Application
We are building a data lineage web application for Snowflake, Redshift and BigQuery. Interested? Take this survey to sign up.
Features
- Generate lineage from SQL query history.
- Supports ANSI SQL queries
- Analyze Data Lineage graphs with Jupyter Notebook
- Browse Data Lineage of datasets using a web browser in server mode
- Visualize data lineage using Plotly.
- Select source or target table.
- Pan, Zoom, Select graph
Checkout an example data lineage notebook.
Use Cases
data-lineage enables the following use cases:
- Business Rules Verification
- Change Impact Analysis
- Data Quality Verification
Check out the post on using data lineage for cost control for an example of how data lineage can be used in production.
Modes
data-lineage can be run in two modes:
- Jupyter Notebook: In this mode, you can visualize and analyze data lineage using graph operations.
- Server: In this mode, your team can browse the lineage of all data sets using a browser.
Supported Databases
- PostgreSQL
- AWS Redshift
- Snowflake
Coming Soon
- AWS Athena
- MySQL/MariaDb