June 12, 2020

AWS Glue with an example

AWS Glue is a fully managed serverless ETL service. It is used for ETL purposes and perhaps most importantly used in data lake eco systems. Its high level capabilities can be found in one of  my previous post here, but in this post I want to detail Glue Catalog, Glue Jobs and an example to illustrate a simple job.

Glue Catalog

Glue Jobs

Glue Example

Here is an example of Glue PySpark Job which reads from S3, filters data and writes to Dynamo Db. The job can be created from console or done normally using infrastructure as service tools like AWS cloudformation, Terraform etc. The code can be found here.