May 2, 2020 · AWS EMR

ETL Options in AWS

Enterprises and Organizations all wants to extract value out of data. So to achieve this, most of the times data has to be transformed , loaded into data warehouses or to other modern data platforms and then analyzed to get insights. In this article, I want to discuss different services in AWS that caters to ETL needs. Here I want to focus on core etl services such as EMR, Glue and Lambda.

EMR(Elastic Map Reduce)

Glue

Lambda

Final thoughts

Use the right tool for the right job. Lambda's are great for light weight ETL tasks and they integrate well with services like SQS, SNS, Step function. They can be very cost effective as well. If you have a persistent EMR cluster already in the environment, then any new heavy to medium weight ETLs should leverage it. If you are starting fresh and doesn’t have emr presence in the environment, Glue is the best tool. Right now the only disadvantage I see with Glue is the absence of auto scaling feature, which I am sure AWS is working on it. There were lot of new features like Streaming, Machine Learning capabilities were added recently. Also, it is ok to have both EMR and Glue in the eco system.