Cloudability logo
Registration is Now Open for CloudyCon 2019
Spend Visibility

The Law of Unintended Consequences: Cloud Billing Spikes

By Aaron Kaffen on June 22, 2012

The Cloud’s scalability is one of its greatest selling points. But with scalability comes variable costs, and whenever you have variable costs, it’s only a matter of time until something runs amok and you end up with a big spike in your cloud billing report.

Panos Ipeirotis pondering his cloud bill

Panos Ipeirotis – Department of Information, Operations, and Management Sciences at New York University

Case in point, one . Panos is “an Associate Professor and George A. Kellner Faculty Fellow at the Department of Information, Operations, and Management Sciences at Leonard N. Stern School of Business of New York University. He is also the Chief Scientist at Tagasauris, and [serves] as ‘academic-in-residence’ at oDesk Research.” In other words, he knows a little something about computers.

Panos is exactly the kind of person who you would never expect to have a problem keeping his cloud billing under control. Yet, in April of 2012, he got an ominous email from AWS letting him know that there’d been an increase in his usage. In fact, his month’s spend was more than 10x his average … and it was going up and up every hour.

What followed was a fantastic bit of detective work on the part of Mr. Ipeirotis and a great reminder that cloud bill spikes happen to the best of us.

Here is his story:

It all started with an email.
From: Amazon Web Services LLC

Subject: Review of your AWS Account Estimated Month to Date Billing Charges of $720.85

Greetings from AWS,

During a routine review of your AWS Account’s estimated billing this month, we noticed that your charges thus far are a bit larger than previous monthly charges. We’d like to use this opportunity to explore the features and functionality of AWS that led you to rely on AWS for more of your needs.

You can view your current estimated monthly charges by going here:

https://aws-portal.amazon.com/gp/aws/developer/account/index.html?ie=UTF8&action=activity-summary

AWS Account ID: XXXXXXX27965
Current Estimated Charges: $720.85

If you have any feedback on the features or functionality of AWS that has helped enable your confidence in our services to begin ramping your usage we would like to hear about it.  Additionally, if you have any questions pertaining to your billing, please contact us by using the email address on your account and logging in to your account here:

https://aws-portal.amazon.com/gp/aws/html-forms-controller/contactus/aws-account-and-billing

Regards,
AWS Customer Service

This message was produced and distributed by Amazon Web Services LLC, 410 Terry Avenue North, Seattle, Washington 98109-5210


What? $720 in charges? My usual monthly charges for Amazon Web Services were around $100, so getting this email with a usage of $720 after just two weeks within the month was a big alert. I login to my account to see what is going on, and I see this:

An even bigger number: $1177.76 in usage charges! One thousand, one hundred, seventy seven dollars. Out of which $1065 in outgoing bandwidth transfer costs. The scary part: 8.8 Terabytes of outgoing traffic! Tera. Not Giga. Terabytes.

To make things worse, I realized that the cost was going up hour after hour. Fifty to hundred dollars more in cloud bill charges with each. passing. hour. I started sweating.

What happened?

Initially I was afraid that a script that I setup to backup my photos from my local network to S3 consumed that bandwidth. But then I realized that I have been running this backup-to-S3 script for a few months now, so it could not suddenly start consuming more resources. In any case, all the traffic that is incoming to S3 is free. This was a matter of outgoing traffic.

Then I started suspecting that the cause of this spike in my cloud bill may be due to the developers that are working in various projects of mine. Could they have mounted the S3 bucket into an EC2 machine that is in a different region? In that case, we may have indeed problems, as all the I/O operations that are happening within a machine would count as bandwidth costs. I checked all my EC2 machines. No, this is not the problem. All EC2 machines are in us-east, and my S3 buckets are all in US Standard. No charges for operations between EC2 machines and S3 buckets within the same region.

What could be causing this? Unfortunately, I did not have any logging enabled to my S3 buckets. I enabled logging and expected to see what would happen next. But logging would take a few hours, and the bandwidth meter was running. No time to waste.

Thankfully, even in the absence of logging, Amazon provides access to the usage reports of all the AWS resources. The report indicated the bucket that was causing the problem:

My S3 bucket with the name “t_4e1cc9619d4aa8f8400c530b8b9c1c09” was generating 250GB of outgoing traffic, per hour.

Two-hundred-fifty Gigabytes. Per hour.

At least I had located the source of the traffic. It was a big bucket with images that were being used for a variety of tasks on Amazon Mechanical Turk.

But still something was strange. The bucket was big, approximately 250GB of images. Could Mechanical Turk generate so much traffic? Given that on average the size of each image was 500Kb to 1MB, the bucket should have been serving 250,000 images per hour. This is 100+ requests per second.

There was no way that Mechanical Turk was responsible for this traffic. The cost of Mechanical Turk would have trumpeted the cost of bandwidth. Somehow the S3 bucket was being “Slashdotted” but without being featured on Slashdot or in any other place that I was aware of.

Strange.

Very strange.

Checking the Logs

Well, I enabled logging for the S3 bucket, so I was waiting for the logs to appear.

The first logs showed up and I was in a for a surprise. Here are the IP’s and the User-agent of the requests.
74.125.156.82 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.64.83 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.64.84 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.156.81 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.158.86 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.156.92 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.64.87 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.156.81 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.158.82 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.158.85 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.158.89 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.156.83 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.158.90 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.156.92 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.64.85 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.158.82 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.156.88 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.158.86 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.156.89 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.158.83 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.156.94 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.156.83 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.158.88 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.156.83 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.64.92 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.156.80 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.64.88 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.158.84 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html) 74.125.158.87 Mozilla/5.0 (compatible) Feedfetcher-Google; (+http://www.google.com/feedfetcher.html)

So, it was Google that was crawling the bucket. Aggressively. Very aggressively.

Why would Google crawl this bucket? Yes, the URLs were technically public but there was no obvious place to get the URLs. Google could not have gotten the URLs from Mechanical Turk. The images in the tasks posted to Mechanical Turk are not accessible to Google to crawl.

At least we know it is Google. I guess, somehow, I let Google learn about the URLs of the images in the bucket (how?) and Google started crawling them. But something was still puzzling. How can an S3 bucket with 250Gb of data generate 40 times that amount of traffic? Google would just download once and get done with that. It would not re-crawl the same object many times.

I checked the logs again. Interestingly enough, there was a pattern: Each image was being downloaded every hour. Every single one of them. Again and again. Something was very very strange. Google kept launching its crawlers, repeatedly, to download the same content in the S3 bucket, every hour. For a total of 250GB of traffic, every hour.

Google would have been smarter than that. Why wasting all the bandwidth to re-download an identical image every hour?

Why would Google download the same images again and again?

Looking more carefully, there was one red flag. This is not the Google crawler. The Google crawler is named GoogleBot for web pages and Googlebot-Image for images. It is not called Feedfetcher as this user agent.

What the heck is Feedfetcher? A few interesting pieces of information from Google:

Interesting. So these images were in some form of a personal feed.

Shooting myself in the foot, the Google Spreadsheet way

And this information started unraveling the full story. I remembered!

All the URLs for these images were also stored in a Google Spreadsheet, so that I can inspect the results of the crowdsourcing process. (The spreadsheet was not being used or accessed by Mechanical Turk workers, it was just for viewing the results.) I used the =image(url) command to display a thumbnail of the image in a spreadsheet cell.

So, all this bandwidth waste was triggered by my own stupidity. I asked Google to download all the images to create the thumbnails in Google Spreadsheet. Talking about shooting myself in the foot. I launched the Google crawler myself.

But why did Google download the images again and again? That seemed puzzling. It seemed perfectly plausible that Google would fetch 250Gb of data (i.e., the total size of the bucket), although I would have gone for a lazy evaluation approach (i.e., loading on demand, as opposed to pre-fetching). But why downloading the same content again and again?

Well, the explanation is simple: Apparently Google is using Feedfetcher as a “url fetcher” for all sorts of “personal” URLs someone adds to its services, and not only for feeds. Since these URLs are private, Google does not want to store them anywhere permanently in the Google servers. Makes perfect sense from the point of view of respecting user privacy. The problem is that this does not allow for any form of caching, as Google does not store anywhere the personal data.

So, every hour, Google was launching the crawlers against my bucket, generating a tremendous amount of crawler traffic. Notice that even if I had a robots.txt, Feedfetcher would have ignored it in any case. (Furthermore, it is not possible to place a robots.txt file in the root directory of https://s3.amazonaws.com as this is a common server for many different accounts; but in any case Feedefetcher would have ignored it.)

The final touch in the overall story? Normally, if you were to do the same thing with URLs from a random website, Google would have rate limited its crawlers, not to overload the website. However, the s3.amazonaws.com domain is a huuuge domain, containing terabytes (petabytes?) of web content. Google has no reason to rate limit against such a huge domain with huge traffic. It made perfect sense to launch 100+ connections per second against a set of URLs that were hosted in that domain…

So, I did not just shoot myself in the foot. I took a Tsar Bomba and I launched it against my foot. The $1000 bandwidth bill (generated pretty much within a few hours) was the price of my stupidity.

Ooof, mystery solved. I killed the spreadsheet and make the images private. Google started getting 403 errors, and I hope that it will soon stop. Expensive mistake, but at least resolved.

And you cannot help but laugh at the following irony: One of the main arguments for using the AWS infrastructure is that it is virtually invincible to any denial of service attack. On the other hand, the avoidance of the denial of service breeds a new type of attack: Bring the service down not by stopping the service but by making it extremely expensive to run…

The real lesson: Google as a medium for launching an attack against others

Then I realized: This is a technique that can be used to launch a denial of service attack against a website hosted on Amazon (or even elsewhere). The steps:

  1. Gather a large number of URLs from the targeted website. Preferably big media files (jpg, pdf, etc)
  2. Put these URLs in a Google feed, or just put them in a Google Spreadsheet
  3. Put the feed into a Google service, or use the image(url) command in Google spreadsheet
  4. Sit back and enjoy seeing Google launching a Slashdot-style denial of service attack against your target.

What I find fascinating in this setting is that Google becomes such a powerful weapon due to a series of perfectly legitimate design decisions. First, they separate completely their index from the URLs that they fetch for private purposes. Very clean and nice design. The problem? No caching. Second, Google is not doing lazy evaluation in the feeds but tries to pre-fetch them to be ready and fresh for the user. The problem? Google is launching its Feedfetcher crawlers again and again. Combine the two, and you have a very, very powerful tool that can generate untraceable denials of service attacks.

The law of unintended consequences. Scary and instructive at the same time: You never know how the tools that you build can be used, no matter how noble the intentions and the design decisions.

If you’d like to read more about Panos’ work, you can visit his blog, A Computer Scientist in a Business School.

If you want to make sure that you never end with a cloud billing spike like this, sign up for a Cloudability account today!

Being in the know feels great