February 11, 2020 · AWS EMR

Managing AWS EMR Clusters in Dev Environments

In non-prod environments we see EMR clusters up and running all the time, though we require the cluster probably say during office hours and may be during week days. Managing the clusters to run during office hours and weekdays can bring down the AWS Cost for development environments drastically.

One of the way to accomplish this is to have two lambdas, one to start the emr cluster and one to stop it and have these lambda’s trigger based on cron schedule. The complete code can be found here.

The code can be used as reference to get details on

  1. How to start and stop the emr cluster using lambda,
  2. How to spin up an emr cluster with just a click of button having all the configuration in lambda code,
  3. To get details about implementation of serverless framework to build and deploy lambda’s to AWS.

Starting EMR Cluster

Stopping EMR Cluster

Serverless code to deploy Lambda's

We also see a practice where developers create there own cluster in sandox or dev environment to test things and forgetting to shut it down. If you see a problem with this and want to avoid these situtations, we can limit the emr create privilege and ask developers to run the lambda to create the cluster. The lambda code will make sure it always creates one cluster, if there is one already running, it skips creating another one.

The assumption here is that the emr cluster runs in public subnet which might not be the case in most places. If the EMR cluster runs in private subnet, run the lambda’s as well in private subnet. The timings of the schedules, cluster sizes, instance types etc can be changed as per the needs. The lambda’s can be invoked on-demand manually if there is an adhoc request to have the EMR cluster up outside of the schedule.