Ask the Experts: How to Optimize Compute for a Growing Cloud
This week, Cloudability Director of Operations, Matt Finlayson, will share insights about how he helped Cloudability save 40% on our monthly cloud bill by using our own product, with a specific focus on applying our Rightsizing feature to AWS EC2 instances.
In our “Ask the Experts” series we go straight to the source to consult one of our own team members about what they do best so we can pass along their learnings and best practices for you to apply within your own organization.
Finding Opportunities to Rightsize EC2
The first issue is finding a time to rightsize — the first opportunities are during an initial release of a service from development to production — but rarely do engineers have enough data before a release to make decisions about what to rightsize. More often, a performance issue introduced by a code change or an increase of users is the motivation. When a bill just becomes too large to bear, engineers have a strong enough incentive to spend the time to rightsize.
In reality, every deployment is an opportunity to evaluate whether a workload is appropriately sized. Before we began developing our Rightsizing feature, the biggest barrier to doing rightsizing was the forensics involved.
I would have to identify likely services, find out the trending profile of their disk, memory, CPU and network usage. Then I would have to make instance size changes that would keep the workload performant. Finally, I’d have to decide if the work involved in moving the service would be worth the savings.
Our Rightsizing feature has removed the leg work for me and makes it easy to identify changes that keep our application performant, stable and cost efficient.
How to Introduce a Culture of Cost Management Within your Team
We’re really excited about our new Rightsizing feature because it is a critical tool that helps us show our customers how to use the cloud more efficiently. We’ve been encouraging our customers to use this new feature and we’ve also been using it ourselves. Our goal with the Rightsizing feature hasn’t just been to save monthly, but also to enable all members of your organization to continually build a leaner cloud.
Over the last six years, we’ve been teaching our customers how to build what our team calls a culture of cost management. This is an important concept for cloud-based businesses with ever-growing infrastructures. The cloud gets complex, as do its costs and ways to use it optimally.
In a nutshell, cloud cost management is building the right processes, lines of communication and tools to empower your whole organization to make better cloud decisions that lead to a leaner, cost-efficient infrastructure. From an engineering point of view, a culture of cost management means internalizing these two principles on a daily basis:
Commit Ahead and Pay Less Over Time For the Things You’ll Always Need
The first thing you want to look at is “how do I pay less for the things that we’re already using that I know we’re going to keep using daily?” We look at workloads and their engineering roadmaps to identify services we know will be in use for a fixed period of time. Often this takes some analysis and coordination with engineering and product teams — the higher confidence you have in the size and stability of your workloads the simpler this step can be. Because we know we’ll be using these things all the time, we can confidently invest in Reserved Instances for them. If you know you’re going to be running a certain workload, you can pay less if you’re willing to commit to it.
For Everything Else, Buy Only What You Need
For other types of workloads it can be harder to make a long term commitment. Often this occurs with development environments, batch workloads, or legacy workloads that will not be around long term. For these types of workloads we want to make sure we’re using the correctly sized resources. This is where a tool like Rightsizing can help us tame our bill.
This can sound simple, but takes some work to pay only as much as you need to for what you’re running. In a perfect world, your engineering teams have good performance data and you have strong projection for load over time. Ideally this is all done up front, but plans change and load changes over time. We’ve found it to be effective to revisit the sizing of your workloads on a regular cadence. Often this approach will mean you try to build elastic workloads that respond to load over time, but Rightsizing will ensure the baselines and scaling increments match your price and performance model.
Understanding the Fundamentals of Compute on the Cloud
Rightsizing cloud resources involves removing the ones that aren’t doing any work, but the next crucial step is changing to resources that fit the workloads better. In the case of AWS EC2, there are a handful of families (and counting) that provide all kinds of cloud compute specifications and specializations. Here’s a quick list to consider:
- M4, T2: General purpose computing, burstable compute
- C4: CPU-optimized instances (buy these for compute-intensive workloads)
- R4: Memory-optimized instances (buy these for memory-intensive workloads)
- I3, D2: Dense storage-optimized instances (buy these for big ROI on storage)
- G2, P1: GPU-optimized instances that offer more computing power from GPU resources
- F1: A newer family that specializes in in-field configurations
- And likely more in the future!
Often, if the profile of the workload doesn’t match the instance family type (for example a CPU-intensive workload on an M4 instance), it might be more efficient on a CPU-oriented instance like a C4. There is another nuance to instance sizing as well: most families have sizings from large through at least 8xlarge. Each jump, say from a c4.xlarge to a c4.2xlarge, will have twice as much memory and CPU.
If you have a workload that is using less than half the resources available to it, moving a size down in the same instance family can be an effective change.
The ability to Rightsize effectively increases greatly if you know what other resources you could move work to to attain a better price and get more value.
What I Look for When Rightsizing
Rightsizing is great to evaluate services that maybe have been over-provisioned or were quickly spun up to take on our latest big ideas.
I believe the Cloudability Rightsizing feature can be a great “cheat sheet” when optimizing your compute resources. Here’s what I mean:
When you have the right aggregate cost and usage data about your cloud infrastructure right in front of you, it clarifies the decision process. This is pretty tough to do with default tools and billing data alone. In the case of Cloudability’s Rightsizing feature, we immediately see fields like:
- Priority: What looks like it’s most likely underutilized based on utilization trends
- Resource name: The name of the resource that can likely tell you which team or department it belongs to (this is a case where a strong convention around instance naming is your friend)
- Resource ID: The AWS-specific resource name to identify the exact service that’s being underutilized
Get more details, as well as example reports, from our Rightsizing feature release article.
What to Look for First
There are two different approaches you want to take to Rightsizing, depending on the situation: The first situation would be when you are Rightsizing something that is a part of your own teams or projects. It’s pretty simple to take a look at something you work on and understand why it’s needed (or what could be cut down).
The second scenario would be when you see something that belongs to another team — you want to approach this differently. If it’s not something you’re directly connected to, you can’t just go in and shut things off that look wrong to you. Cost efficiency doesn’t just mean “turning things off,” it’s about pulling the right data to have the right conversations with the right people to understand what can be rightsized/optimized.
Caution: Don’t Terminate or Alter Production Resources! Be Mindful of Workload Context
So I’m going to keep an eye on resource names that relate to the projects that I’m working on. For instance, in this example, I can see that several resources here seem to be candidates for Rightsizing:
Now I have a list of resources that I can quickly act on. If I still feel a resource is truly underutilized after looking at what work it is actually doing, I can tell my team to take action on what might be waste by frequently checking through this list.
Staging workload r4.4xlarge is $250 a month, but only using half the memory and it’s 95% idle. The R4 family is built for memory-intense operations. Since we’re clearly not using all of this instance to its potential, I’m going to head over to the AWS console and take a closer look at this resource to see what I can do to either cut cost or put more work on this instance.
“Resource Name” is very helpful here, as I can use it to quickly grab the name of the instance that we need to take a closer look at. Is it a production workload? Is it a non-production workload? This lets you be more confident about taking action.
It’s More Than Just Removing Resources
Rightsizing is not always about eliminating cloud services to save money. If we can throw more work onto underutilized services that we’re already paying for, and have spare capacity to use, that’s also a win.
One option would be to size down the same workload onto another instance family and size (if this is a workload that your team owns and you have all the permissions and authority to do so).
Another option would be to reconfigure the service to get more performance. For instance, some microservices for our workload let you configure the amount of memory it should use. You could talk to the service owner to confirm that they absolutely need to use a certain amount of memory. If we can use less, we open up a few options to change instances to a smaller size in a different family to provide exactly what the workload needs, but pay a lot less for it.
This is the type of discussion and workflow that our Rightsizing tool creates. It’s one part of a bigger culture of cloud cost management. Rightsizing is a place to start looking at what’s going on and how you can save. It’s one part looking at the right data to determine what might be underutilized or wasteful, and another part talking to the right teammates to take the actions to optimize cloud efficiency.
This post addresses just one aspect of Rightsizing — it’s not just for compute! Coming up soon on our blog, we’ll explore how to optimize storage volumes for a growing cloud.