Skip to main content

Data Lineage

CircleCI codecov PyPI image image

Tokern Lineage Engine is a fast and easy to use platform to collect, visualize and analyze column-level data lineage in databases, data warehouses and data lakes in AWS and GCP.

Tokern Lineage Engine helps you browse column-level data lineage

Utilize column-level lineage to add rich context and powerful automation for common data management tasks like:

  • Track and debug data quality.
  • Track PII, PHI and other sensitive data and their access rights.
  • Save costs by removing unused ETL pipelines and datasets.
  • Enrich your data dictionary to help users find the right dataset for their analysis.

Check out the post on using data lineage for cost control as an example.

Calling for Tokern Lineage Engine beta users

We are building a data lineage platform for Snowflake, Redshift and BigQuery. Interested? Take this survey to sign up.

Need help with your data governance and lineage strategy?

If you would like hands-on assistance setting up Tokern projects, open source data catalogs like Datahub or Amundsen, or adding additional functionality to Tokern Lineage Engine, please get in touch through this form.

Resources

Demo of Tokern Lineage App

data-lineage

Download the docker-compose file from Github repository.

# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml -o docker-compose.yml

Run docker-compose

docker-compose up -d

Check that the containers are running.

docker ps
CONTAINER ID IMAGE CREATED STATUS PORTS NAMES
3f4e77845b81 tokern/data-lineage-viz:latest ... 4 hours ago Up 4 hours 0.0.0.0:8000->80/tcp tokern-data-lineage-visualizer
1e1ce4efd792 tokern/data-lineage:latest ... 5 days ago Up 5 days tokern-data-lineage
38be15bedd39 tokern/demodb:latest ... 2 weeks ago Up 2 weeks tokern-demodb

Try out Tokern Lineage App

Head to http://localhost:8000/ to open the Tokern Lineage app

Jupyter Notebooks and case studies

Getting Started

Check out Installation for multiple options to start the data_lineage engine and browse lineage graphs.

Checkout the following example notebooks to analyze lineage graphs:

Features

  • Lineage & Catalog stored in a database.
  • Integrates with open source data catalogs.
  • API and SDK to integrate with any ETL framework.
  • Use the query parser module to generate lineage from SQL query history.
  • Supports ANSI SQL queries.

Supported Databases

  • PostgreSQL
  • AWS Redshift
  • Snowflake

Coming Soon

  • AWS Athena
  • MySQL/MariaDb