Data Lineage is an open source application to query and visualize data lineage in databases, data warehouses and data lakes in AWS and GCP.
Data Lineage Web Application
We are building a data lineage web application for Snowflake, Redshift and BigQuery. Interested? Take this survey to sign up.
- Lineage & Catalog stored in a database.
- API and SDK to integrate with any ETL framework.
- Use the query parser module to generate lineage from SQL query history.
- Supports ANSI SQL queries
- Analyze Data Lineage graphs with Jupyter Notebook
- Browse Data Lineage of datasets using a web browser in server mode
- Visualize data lineage using Plotly.
- Select source or target table.
- Pan, Zoom, Select graph
Checkout the following example notebooks:
- Use the API to create the lineage graph
- Use the query parser module to generate lineage from SQL ETL queries.
data-lineage enables the following use cases:
- Business Rules Verification
- Change Impact Analysis
- Data Quality Verification
Check out the post on using data lineage for cost control for an example of how data lineage can be used in production.
data-lineage can be run in two modes:
- Jupyter Notebook: In this mode, you can visualize and analyze data lineage using graph operations.
- Server: In this mode, your team can browse the lineage of all data sets using a browser.
- AWS Redshift
- AWS Athena