Tutorial: How to set up Access Permissions for AWS Lake Formation with Examples

Rajat Venkatesh11/26/2019 2 Min Read

AWS Lake Formations helps to set up a secure data lake on AWS S3. One of the main goals of the product is Simplified Security Management. The central tenet of this goal is to define security, governance, and audit policies in a single location. Lake Formation permissions go beyond IAM policies to manage secure access to databases, tables, and columns.
The previous blog post explained how to set up a data lake with NYC Trip Data.

“fort” is licensed under CC0 1.0

This tutorial uses the same data lake and shows how to set up basic permissions for two personas: data engineer and data analyst. It explains resources and permissions and sets up two users and their permissions (data engineer and data analyst) to explain the concepts and how they help to secure the data lake.


Resources and Permissions

There are two types of resources in the AWS Lake Formation: Metadata is stored in AWS Glue Data Catalog and consists of databases, tables, and columns. Data is stored in AWS S3 locations or databases.

Metadata

The table below describes the metadata permissions available for each resource in the catalog.


Data

Data Location permissions are applicable to AWS S3 locations registered with Lake Formation. The only permission available as of this article is DATA_LOCATION_ACCESS.


DATA_LOCATION_ACCESS allows users to create tables without the need to have IAM permissions for an S3 location.


Similarly, SELECT, INSERT and DELETE permissions are available only on registered S3 locations.


An important point is that Lake Formation users do not need IAM permissions to access tables in a registered data location. Lake Formation's permissions ensure secure access instead of AWS IAM permissions.
Note that for database locations such as RDS or Redshift, the permissions are determined by the user specified in the JDBC connection string and the GRANT permissions in the database.

Permissions on NYC Trip Data Data Lake

Pre-requisites


AWS Glue, AWS Athena, and AWS Lake Formation have been set up. The NYC Taxi Trip data set is available in the data lake in the taxidata database. lakeadmin is the AWS Lake Formation administrator. lakecli and athenacli are set up and configured in your system.

Create Policies and Users

First, create a few managed policies. Navigate to AWS Console > IAM > Policies > Create Policy and create the following policies


  1. DataEngineerLakeFormationPolicy
{
    "Version": "2012-10-17",
    "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                        "lakeformation:GetDataAccess",
                        "lakeformation:GrantPermissions",
                        "lakeformation:RevokePermissions",
                        "lakeformation:BatchGrantPermissions",
                        "lakeformation:BatchRevokePermissions",
                        "lakeformation:ListPermissions" ],
                "Resource": "*"
            }
    ]
}


  1. DataEngineerPassRole


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PassRolePermissions",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": [
                "arn:aws:iam::account-id:role/workflow_role"
            ]
        }
    ]
}


  1. DataAnalystGlueAccess


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetDataAccess",
                "glue:GetTable",
                "glue:GetTables",
                "glue:SearchTables",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:GetPartitions"
            ],
            "Resource": "*"
        }
    ]
}


  1. S3NycBucketRead is required because nyc-tlc is not registered as a data location and explicit IAM permissions are required.


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::nyc-tlc",
                "arn:aws:s3:::nyc-tlc/*"
            ]
        }
    ]
}


Navigate to AWS Console > IAM > Create User and create the following users with attached policies.



Register a data location

Create an S3 bucket called prefix_datalake. Choose a unique prefix. In AWS Console, go to AWS Lake Formation > Databases > Create Database Create a database with the following details:


Note that no explicit IAM permissions are set up for the users.


Data Engineer Setup

This section sets up the permission for datalake_engg. datalake_engg creates a table in taxilake from a table in taxidata.


datalake_engg needs CREATE_TABLE permission to create and manage tables in a database.


# lakeadmin in lakecli
lakecli> select * from database_privileges where principal like '%datalake_engg%';
0 rows in set
Time: 0.001s
lakecli> GRANT CREATE_TABLE ON DATABASE 'taxilake' to 'user/datalake_engg';
GRANT
Time: 3.071s
lakecli> select * from database_privileges where principal like '%datalake_engg%';
+----+-------------+--------------------+--------------+-------+
| id | schema_name | principal          | permission   | grant |
+----+-------------+--------------------+--------------+-------+
| 4  | taxilake    | user/datalake_engg | CREATE_TABLE | 0     |
+----+-------------+--------------------+--------------+-------+
1 row in set
Time: 0.027s
lakecli> GRANT SELECT ON TABLE 'taxidata'.'csv_misc' to 'user/datalake_engg';
GRANT
Time: 3.279s
lakecli> select * from table_privileges where principal like '%datalake_engg%';
+----+-------------+------------+--------------------+------------+-------+
| id | schema_name | table_name | principal          | permission | grant |
+----+-------------+------------+--------------------+------------+-------+
| 3  | taxidata    | csv_misc   | user/datalake_engg | SELECT     | 0     |
+----+-------------+------------+--------------------+------------+-------+
1 row in set
Time: 0.024s
#datalake_engg in athenacli
us-east-2:default> select zone from taxidata.csv_misc limit 10;
+---------------------------+
| zone                      |
+---------------------------+
| "Newark Airport"          |
| "Jamaica Bay"             |
| "Allerton/Pelham Gardens" |
| "Alphabet City"           |
| "Arden Heights"           |
| "Arrochar/Fort Wadsworth" |
| "Astoria"                 |
| "Astoria Park"            |
| "Auburndale"              |
| "Baisley Park"            |
+---------------------------+
10 rows in set
Time: 4.064s
us-east-2:default> create table taxilake.zones as select zone from taxidata.csv_misc;
0 rows in set
Time: 11.644s
us-east-2:default> select count(*) from taxilake.zones;
+-----------+
| _col0     |
+-----------+
| 107003212 |
+-----------+
1 row in set
Time: 4.030s
us-east-2:default> select count(*) from taxidata.csv_misc;
+-----------+
| _col0     |
+-----------+
| 107003212 |
+-----------+
1 row in set
Time: 4.844s


The data engineer automatically gets SELECT, INSERT, DELETE, ALTER and DROP permissions with GRANT capability.


\r:iamdb> select * from table_privileges where principal like '%datalake_engg%';
+----+-------------+------------+--------------------+------------+-------+
| id | schema_name | table_name | principal          | permission | grant |
+----+-------------+------------+--------------------+------------+-------+
| 1  | taxilake    | zones      | user/datalake_engg | ALL        | 1     |
| 2  | taxilake    | zones      | user/datalake_engg | ALTER      | 1     |
| 3  | taxilake    | zones      | user/datalake_engg | DELETE     | 1     |
| 4  | taxilake    | zones      | user/datalake_engg | DROP       | 1     |
| 5  | taxilake    | zones      | user/datalake_engg | INSERT     | 1     |
| 6  | taxilake    | zones      | user/datalake_engg | SELECT     | 1     |
| 7  | taxidata    | csv_misc   | user/datalake_engg | SELECT     | 0     |
+----+-------------+------------+--------------------+------------+-------+


Analyst/Data Scientist

datalake_user needs SELECT permission on a table to read data.


# lakeadmin in lakecli
lakecli> select * from table_privileges where principal like '%datalake_user%';
0 rows in set
Time: 0.002s


datalake_user also cannot run any queries in Athena.


# datalake_user in athenacli
us-east-2:default> SELECT COUNT(*) FROM taxidata.csv_misc;
Insufficient permissions to execute the query. Insufficient Lake Formation permission(s) on csv_misc


datalake_user has access to the table data once SELECT permission on the table is granted.


# lakeadmin in lakecli
lakecli> GRANT SELECT ON TABLE 'taxilake'.'zones' to 'user/datalake_user';
GRANT
Time: 1.499s
lakecli> select * from table_privileges where principal like '%datalake_user%';
+----+-------------+------------+--------------------+------------+-------+
| id | schema_name | table_name | principal | permission | grant |
+----+-------------+------------+--------------------+------------+-------+
| 2 | taxilake | csv_misc | user/datalake_user | SELECT | 0 |
+----+-------------+------------+--------------------+------------+-------+
1 row in set
Time: 0.024s
# datalake_user in athenacli
us-east-2:default> SELECT COUNT(*) FROM taxilake.zones;
+-----------+
| _col0     |
+-----------+
| 107003212 |
+-----------+
1 row in set
Time: 6.932s


Summary

AWS Lake Formation Permissions are better suited than IAM permissions to secure a data lake. The two main reasons are Lake Formation Permissions are on logical objects like a database, table, or column instead of files and directories. Lake Formation Permissions provide granular control for column-level access. This blog explores the permissions required for a basic setup of a dataset, administrator, data engineer, and analyst. Get in touch with us to discuss further how to secure your data lake using Lake Formation.


Similar Posts

Get in touch for bespoke support for PII Catcher

We can help discover, manage and secure sensitive data in your data warehouse.