Amazon S3 is a very useful service. S3, according to the official Amazon Web Services website is
Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.
Its a no frills service and does exactly what it promises — makes it easy for developers so that they can concentrate on features and leave the scaling to Amazon. If you are new to Amazon S3, heres a good starting guide for you.
S3 is like a sharp sword, you must know how to play with it lest you can hurt yourself. Thats exactly what happened to me. One of the many applications that we are developing on MySpace, Sketch Me, required us to store (and serve) huge amount of image data (user sketches). And due to the viral nature of the application, the load almost tripled every month. S3 was a clear choice. It saved us time, money and headache. Our current stats (with caching) are:
- Total files stored: 205GB
- Bandwidth per month: 2TB
- GET Requests per month: 112m
Clearly I would not like to waste time setting up image serving servers that can handle such load and I am more than happy to outsource it to Amazon S3.
You would be surprised before caching when our total images were just 5GB, the no of requests were 263m ($363.91) (almost double to what it is now with 205GB of images)
So if we take the total request to be directly proportional to number of images, with that rate the actual requests should be approx 4.5 billion or $15,000 :O
How did I tame the beast?
At first look the pricing of Amazon’s S3 services seems quite cheap. Wait until you get your first bill and you will see have cents add up to huge $$$.
- $0.150 per GB – first 50 TB / month of storage used
- $0.100 per GB – all data transfer in
- $0.170 per GB – first 10 TB / month data transfer out
- $0.01 per 1,000 PUT, POST, or LIST requests
- $0.01 per 10,000 GET and all other requests
So after getting the first bill for a few hundred dollars, I sat down thinking how to bring that down. When I was digging the HTTP headers, I found out that by default S3 doesn’t have any cache request headers set. So even when the visitor has the requested the file from S3 before and has it in his browser cache, the browser will send a HTTP
GET request to S3 just to verify if the file has changed. S3 returns a
304 Not Modified header if the file has not changed and file wont be downloaded. You may think, S3 saved me a few GB of bandwidth cost. But each of this requests cost you ($0.01 per 10,000 GET) which is generally the bulk of the S3 bill.
Since photos our users upload almost never change. Asking S3 every time if the file has changed on S3 is certainly not required. You can stop browser sending this extra request for the same user by setting appropriate
Cache-Control headers or
Expires headers for the files. We can set
Cache-Control max-age=864000 which will tell browser to not request the same file until next 10 days (3600*24*10 sec)
Fortunately S3 allows us to do that, but there is no simple and easy way to do that. So I decided write a small script (see below) to achieve this.
After setting Cache headers, my bill got down drastically. The traffic doubled and number of images almost tripled (5GB – 15GB) within a month while the number of requests was reduced 3 times. So that is 9 times reduction in cost. Ideally, your bandwidth cost should be more than your requests cost.
<link href="http://s3.amazonaws.com/lalit/style.css?v=3" ... />
after change in stylesheet, change your code to,
<link href="http://s3.amazonaws.com/lalit/style.css?v=4" ... />
You can download the code here (zip 6k).
Update: Fixed a bug in code (mime type calculation was failing on some php configurations)
To use the script, you have to upload it to your server (running PHP). You need to edit the
upload.php file to specify your AWS access key and Secret, your S3 bucket name.
/* One time settings. */ $awsAccessKey = '---'; // your AWS Key $awsSecretKey = '---'; // your AWS Secret $bucket_name = '---'; // S3 Bucket name $age = 3600*24*10; // Cache age 10 days /* File Data */ $s3_dir_name = 'dir1/dir2/'; // Directory on s3 where you want to upload file // example http://s3.amazonawas.com/bucket_name/dir1/dir2/filename.ext $upload_file = 'filename.ext';// name of the file you want to upload. // keep it in the same dir as this file.
After you have saved the config info, every time you need to upload a file, you have to specify the file name and the dir name in
upload.php and callit from the browser http://yoursite.com/s3/upload.php. (I know it sucks, I promise will make it better soon)
If you want to escape all this trouble, there is a paid tool Bucket Explorer which helps you to upload files with custom headers. I havn’t used it but it looks great and works on Win/Linux/Mac.
I hope your next S3 bill will come down