Cloudability logo
451 Report: Cost Management in the Cloud Age
Optimization

How we saved 64% on our dev/test instances by scheduling our uptime

By Carl Hall on October 2, 2014
Sccheduled uptime

There are certain aspects of your infrastructure that you’ll need to keep running 24/7 in order to keep your production environments operative. However, other resources, such as in your staging & dev environments, aren’t needed all the time. These resources are primarily—or even exclusively—used at certain predictable intervals, generally during business hours when the internal team needs them.

Despite their fluctuating usage levels, instances in these environments may not have auto scaling built into them—which means they might be running during hours when nobody is using them, and racking up costs all the while.

During our most recent infrastructure audit, we introduced a process for significantly lowering the amount of time necessary to ensure ongoing efficiency in our staging and dev environments—while significantly lowering our costs along the way. We call that process “Valet,” and here’s how we did it.

Confirm idle instance times

When you’ve identified some instances that you suspect are only needed during certain hours—such as those in your dev and staging environments—you can test your hypothesis by first simply asking around. If, like us, the only person in your staging environment is your Lead QA Engineer, ask her what her schedule is. 8AM-5PM? Great—chances are, your staging instances only need to be running between 8 and 5.

It’s easy to validate an instance’s usage schedule—to, for example, confirm that your staging environments are only being used when your Lead QA Engineer is in the office—with a Cloudability Usage Analytics report. To generate a Staging Utilization report, apply an Environment = staging filter. You can click on specific instances to view their unique utilization patterns and confirm their hourly usage history over a selected date range. If your hypothesis is correct, you’ll find that these instances are consistently unused between 5PM and 8AM. These instances are great candidates for scheduling.

Assign instance schedules with tags

Once you’ve confirmed when your instances need to be running and when they don’t, you can label the instances according to that schedule by attaching a schedule = X tag in the Amazon console. You can easily schedule instance downtime with cron, which is available on any unix- or linux-based OS (for Windows, you’ll want to look into Scheduled Tasks) using a crontab format as follows:

Crontab format

For example, a tag indicating schedule = * 7-17 mon-fri would queue the instance to run Monday through Friday from 7AM to 6PM (providing a nice one-hour buffer for your Lead QA Engineer). Notice that the end time per the crontab is 17:59, as it is scheduled to run every minute of the 17th (5PM) hour. If you make this 7-18 (7AM-6PM), your instance will run every minute for each of those hours and will therefore end at 7:00PM.

Write a bit of code

Once you’ve tagged your instances with their desired schedules, it’s time for a little coding. You can write a small program which will power down or boot up instances according to their schedules. We call ours Valet.

Cron runs Valet every 15 minutes. When run, Valet calls to AWS for instances that have a “schedule” tag, compares their assigned schedule to the current time, and determines whether they should be turned off, turned on, or left alone. Valet will then start or stop the instances accordingly.

Be wary of the fine print

There are several things to keep in mind when first implementing this process. When you stop and start a virtual machine—as you do with a program like Valet—it doesn’t retain the same IP address. We found that some of our machines, which had hard coded addresses, were unable to find each other at their new IPs after a restart. We quickly learned that introducing a self-discovery, self-registration, or provisioning system solves this problem, so that you don’t have to manually intervene and re-connect your instances when they automatically start up again.

We also introduced an automatic tagging system, so we can continue tracking our usage even as our Valet system turns instances on and off. When an instance is turned on after its scheduled downtime, it gets tagged with the same metrics that the previous instance had. Tagging your instances this way makes it easy to see how much scheduling downtime can save on server usage costs.

Sit back while Valet drives

Initially, we reduced the cost of six instances from $284.59 a week to $101.64 by assigning schedules and running Valet—that’s 64%, and almost $10,000 a year. Now imagine how many more machines in an infrastructure would likely benefit from this process—probably a lot more instances, racking up a lot more money.

While the process of identifying and setting the schedules for each of your instances may initially seem daunting, the resulting savings can be considerable—and chances are you’ll end up saving your company time and effort down the road when you don’t have to scramble to reduce costs when under budget pressure

Take it further

If you’d like to take your scheduling process even further, there are a few additional steps you can take to ensure smooth sailing.

First, consider introducing a script to review all of your assigned instance schedules and regularly notify relevant teams and project owners of those schedules. If an instance isn’t tagged with a schedule and is therefore running all the time, the responsible teams will be notified accordingly—and can either confirm that such a schedule is appropriate, or modify its uptime through its schedule tag. This process can help promote accountability in instance usage and ensure that each instance is running according to a schedule deemed appropriate by the team using it.

You might also want to send out notifications before your instances are stopped according to their schedules, and allow for manual schedule overrides. This way, an instance won’t go down before someone has been notified and had a chance to override the downtime if necessary.

Get started

To reduce your own unnecessary AWS costs by scheduling instance downtime, check your usage patterns with Cloudability Usage Analytics or contact our customer success team to walk through the process together. Log in or sign up for a free 14-day trial of Cloudability Pro to get started today!

Being in the know feels great