Cloudability logo
E-Book - The Complete Guide to Saving with AWS Reserved Instances

How to Minimize Risk when Rightsizing Your Cloud

By Gavin Cahill on October 10, 2018
Rightsizing your cloud usage can dramatically increase your efficiency and get more from your cloud. There is some risk to rightsizing, but the right analysis and visibility can help you minimize that risk.

The Double-Edged Sword of Cloud Flexibility

Back in the pre-cloud days, a big part of data center purchasing was predicting future use. It was often prudent to invest heavily in extra capacity to make sure there weren’t any outages. But not anymore. Now, if you need more resources, you just spin them up. Don’t need them anymore? Shut them down and stop paying for them. It’s much more flexible, scalable and reliable.

It also opens the door for waste. It’s extremely easy to spin up instances that have more capacity than you need, or to use a high-resource instance for one project, retask it to a different project with lower requirements, then forget about it. As a result, you could be getting billed for resources that you’re not using.

This is where rightsizing comes in. The practice of rightsizing helps you optimize your cloud use to give you the exact amount of resources you need. The money you save can then be used elsewhere, such as investing in key product features or to get additional cloud resources for other projects.

Is Rightsizing Worth the Risk?

There is always a risk when you’re rightsizing. If you succeed, you can free up much-needed working capital that can be reinvested elsewhere. If you fail, you have the potential to break vital company systems.

So is it worth the risk? When the risk is mitigated as much as possible, then yes. When done correctly, rightsizing can routinely provide cloud savings of 20-30%. There are plenty of other risky actions that return far less on the investment, and companies are doing them on a regular basis. Bain & Company found that most efficiency programs target a 10% increase in efficiency, a standard far below the potential efficiency increase from rightsizing.

To give you an idea of how effective rightsizing can be, imagine that you’re running an r3.4xlarge instance at the cost of $957.60 for 30 days. Now imagine that you could run the same load on a c5.4xlarge without any negative effects for 46% less at only $521.60. Over a year, that adds up to $5,232 in savings. Now picture what would happen if you multiplied that $5,000 over 1,000 instances. All told, your company would save $5 million over the course of the year without losing any performance.

That last part is the key to effective rightsizing — you should be able to do it without negatively impacting performance. That means doing everything possible to mitigate the risk. To do that, you need complete visibility into as many factors as possible. The more visibility you have, the more you’ll be able to make your rightsizing decisions with confidence.

How Do You Minimize Rightsizing Risk?

Visibility for Each Instance Across Multiple Dimensions

The first step to minimizing risk is to make sure that you have a view into the whole picture. This is tougher than it sounds because of the massive amount of data that needs to be processed and analyzed. After all, every instance has four core factors: CPU, memory, disk space and network speed. Use data for those instances is logged every second. Couple that with cloud architectures that could easily include thousands of instances, and you’ve got billions of data points that need to be processed every day.

For smaller cloud systems, it can be possible to track this on your own with intricate spreadsheets and complicated homegrown tools. It’s tricky and error-prone, but doable. If you have a larger cloud architecture or enterprise setup, then you’re going to want to find a specialized management tool to handle it all.

Say you have a group of instances for video streaming, another cluster for machine learning processing, and one for running your dashboard interface. Each of these will have drastically different requirements, such as more transfer speed for streaming, more CPU for machine learning and a more generalized setup for the dashboard. Even within those clusters, different functions will have slightly different requirements. If you’re going to rightsize those instances, then you need to be able to see exactly how they’re being used.

Prediction Should Be Based on Your Actual Use and Needs — Not Averages

The core of rightsizing is taking past behavior and using it to predict behavior in the future. With each instance, you’re looking at how it was used in past, then using that as a model for how it will be used in the future. (At Cloudability, we look at either the past 10 days or 30 days.) Using that model, you can then change your instance as necessary to fit your needs.

There are a couple ways of building that model. A simple model will tabulate all of the instance’s use, take an average and recommend you get capacity that fits that average. The drawback of this model is that it doesn’t compensate for peak usage. Unfortunately, the spikes during peak usage are usually when outages will cause the most problems to the most number of users.

We’ve found that a more effective model builds recommendations based on peak numbers in your past usage. That way, you’ll have the capacity you need during usage spikes and minimize the risk of outages.

Machine Learning Makes Prediction Even More Accurate

All that usage data is the perfect fuel for machine learning using the principle that the effectiveness of any 10-day prediction can be judged 10 days later. Machine learning can be used to repeatedly test those predictions and constantly refine the prediction model. So your prediction isn’t just based on the past 10 days of that instance’s use, it’s based on the past 10 days compared to the effectiveness of thousands of past prediction models.

At Cloudability, our approach to lowering risk involves giving our machine learning models as much data as possible. Our model pulls from the data of thousands of customers, billions of dollars of past usage and trillions of hours of usage data to back up its decisions. We believe that this is the most effective way to create dependable recommendations that allow you to make confident rightsizing decisions with minimal risk.

You Need to See Your Risk Options

Every rightsizing choice will have some risk for peaking, even if it’s a very tiny risk. With 115 different types of instances on AWS alone, you’ve got a plenty of choices, each with a different degree of risk. Risk is measured by the possibility of peaking. The higher the probability, the higher the risk.

This choice has very little risk, since the CPU level is chosen based on leaving room for the peaks.

This choice has significantly more risk. The recommended CPU level will be fine for most traffic, but will have trouble with future peaks.

In the end, the right choice really comes down to how much risk you’re willing to take. Your risk tolerance could depend on a variety of factors. Maybe you’ve updated the code so the peaks won’t happen. Maybe your expected use is changing, or maybe it’s a staging environment where a peak means a slower return instead of a process shutdown.

Whatever your reasons, your risk tolerance is a choice that should be made by you. And in order to make rightsizing decisions based on that tolerance, you need to make sure that you have full visibility into the possible risk of your rightsizing choices.

The Cloud Is Dynamic. Your Rightsizing Should Be, Too

The nature of the cloud is to always change. Every year, AWS, GCP and Azure announce hundreds of changes to their pricing, instance offerings and services. All of these will influence your rightsizing choices. If you do an extensive rightsizing project in January, then who knows if some of those choices will still be the best ones in November when all the big conferences are done and the big announcements are out. And the cloud providers aren’t the only ones changing things up. The cloud gives companies unparallelled flexibility, and that flexibility translates into dynamic cloud architectures.

This is where data science is so crucial to your rightsizing efforts. Using a data science approach, models can be made that incorporate the latest enhancements and usage data, then cross-references them against past usage data. When combined with machine learning, you’re able to find opportunities you didn’t know about based on data you didn’t know existed.

Rightsizing must be a continuous practice that’s dynamically adjusting to the new offerings and your new cloud use. As a bonus, this also helps the actual task of rightsizing be less arduous. The first time you open the Rightsizing feature in Cloudability, you’ll be greeted by pages of recommended rightsizing actions, each with the potential to save you thousands of dollars. The more you rightsize, the smaller that list will be each time you log in, and the easier it will be to keep on top of optimization.

Are you ready to implement rightsizing in your company? Make sure you have the visibility and data you need to have confidence in your decisions. Sign up for your free trial of Cloudability and see what you’re missing!

Being in the know feels great