- Posts tagged data storage
- Explore data storage on posterous
Amazon Simple Storage Service [S3]
What is S3??
Simple Storage Service: Is a web service that allows any developer to gain access to highly scalable, very reliable, inexpensive storage space. Your data is replicated to multiple servers at multiple data centers.
How to get started
Go to Amazon’s AWS page, then to the S3 page and sign up (check out the other services while you are there)
Pricing (From Amazon’s site…)
Pricing
…
New Pricing (effective June 1st, 2007)
Storage
$0.15 per GB-Month of storage usedData Transfer
$0.10 per GB - all data uploaded$0.18 per GB - first 10 TB / month data downloaded
$0.16 per GB - next 40 TB / month data downloaded
$0.13 per GB - data downloaded / month over 50 TBData transferred between Amazon S3 and Amazon EC2 is free of charge
Requests
$0.01 per 1,000 PUT or LIST requests
$0.01 per 10,000 GET and all other requests*
* No charge for delete requestsStorage and bandwidth size includes all file overhead
I looked around the web for similar services (hard to find someone that posts prices), and for 180 gig’s of reduntantly stored data, it was in the $200/month price range.
The same 180Gb on S3 would be
$27 to store for 30 days
$18 to xmit (the entire 180Gb) to S3
In addition to price, with S3, you are in full control of how and when you put and/or get your data.
Only pay for what you actually use
One of the niceties about S3 (and the other Amz web services) is that you pay just for what you use.
Didn’t use that service last month: Pay $0.
This allows you to ‘tinker’ all you want for mere pennies
Amazon S3 - Objects
What is an Object?
Object is the term we use in S3 for the ‘thing’ (file/data) you want to store.
Once an object is stored in S3, it contains the original data (contents of the file), plus a given amount of meta-data (name/value pairs).
You can add your own metadata but some of the standards are ‘Last-Modified‘ and ‘Content-Type‘
A given Object can be from 1byte to 5GBs
Amazon S3 - Buckets
Why Buckets?
Buckets provides a unique namespace for management of objects contained in the bucket
Bucket namespaces are Global across all of S3 (all users of S3. Similar concept as ‘domain names‘)
An S3 account is allowed 100 buckets
Amazon S3 - Keys
Key
A key is the unique identifier for an object within a bucket
Locating an object
Any Object can be located by its [bucket + key] using a RESTful formatted URL
http://s3.amazonaws.com/foo-products/2006/may/1845.prd
foo-products is the bucket & 2006/may/1845.prd is the Key
S3 - Authentication
Most requests to S3 require authentication, this ensures that you don’t get charged for operations you didn’t authorize, and that nobody else sees your private data.
You can grant various access models (acl) for an Object or an entire Bucket
- private
- public-read
- public-read-write
- authenticated-read
To set the ACL, when you PUT the Object to S3, you set a x-amz-acl header. For example…
x-amz-acl: public-read
ACl defaults to private if not set on the PUT
S3: Putting it all Together
How do you speak to S3?
At this point, all interaction is done with the HTTP protocol (the current exception is that you can retrieve objects using http or BitTorrent).
So, creating a program to interact with S3 is just a matter of creating HTTP requests and reading HTTP responses. Something PHP is quite capable of (especially with a little help from PEAR HTTP_Request and Crypt_HMAC)
First get an account and your Keys
- Access Key ID: You add this to any requests to S3. Essentially it is your unique identifier that tells S3 a given request is targeted for your account.
- Secret Access Key: For requests to S3 for Objects with acl’s that require authentication (i.e. private, authenticated-read), you ’sign’ your request with this secret key.
The following code examples are based on what is in the Amazon S3 developer docs
Create a Bucket
# create bucket requestPUT /[bucket-name] HTTP/1.0Date: Wed, 08 May 2007 08:45:09 GMTAuthorization: AWS [aws-access-key-id]:[header-signature]Host: s3.amazonaws.com# create bucket responseHTTP/1.1 200 OKx-amz-id-2: VjzdTviQorQtSjcgLshzCZSzN+7CnewvHA+6sNxR3VRcUPyO5fmSmo8bWnIS52qax-amz-request-id: 91A8CC60F9FC49E7Date: Wed, 08 Mar 2006 04:06:15 GMTLocation: /[bucket-name]Content-Length: 0Connection: keep-aliveServer: AmazonS3
Put Objects in your Bucket
# put object requestPUT /[bucket-name]/[key-name] HTTP/1.0Date: Wed, 08 Mar 2006 04:06:16 GMTAuthorization: AWS [aws-access-key-id]:[header-signature]Host: s3.amazonaws.comContent-Length: 14x-amz-meta-title: my titleContent-Type: text/plainthis is a test# put object responseHTTP/1.1 200 OKx-amz-id-2: wc15E1LUrjDZhNtT4QZtsbtadnOMKGjw5QTxkRDVO1owwbA6YoiqJJEuKShopufwx-amz-request-id: 7487CD42C5CA7524Date: Wed, 08 Mar 2006 04:06:16 GMTETag: "54b0c58c7ce9f2a8b551351102ee0938"Content-Length: 0Connection: keep-aliveServer: AmazonS3
Retrieve Objects from your bucket
# get object request
GET /[bucket-name]/[key-name] HTTP/1.0Date: Wed, 08 Mar 2006 04:06:18 GMTAuthorization: AWS [aws-access-key-id]:[header-signature]Host: s3.amazonaws.com
# get object response
HTTP/1.1 200 OKx-amz-id-2: FbGpiykb9oJEdJd0bcfwkL6S3lc06X0y7XSeA/GWyRdvlNEZ0irthljxKoeGFfB6x-amz-request-id: 9298531013923634Date: Wed, 08 Mar 2006 04:06:18 GMTLast-Modified: Wed, 08 Mar 2006 04:06:16 GMTETag: "54b0c58c7ce9f2a8b551351102ee0938"x-amz-meta-title: my titleContent-Type: text/plainContent-Length: 14Connection: keep-aliveServer: AmazonS3
this is a test
S3 - The PHP way
Implementing an API to S3 with PHP
Prerequisites
You’ll need the PEAR libraries Crypt_HMAC & HTTP_Request (at least things are much easier if you have these)
# sudo pear install Crypt_HMACpear.php.net" to updatedownloading Crypt_HMAC-1.0.1.tgz ...Starting to download Crypt_HMAC-1.0.1.tgz (2,149 bytes)....done: 2,149 bytesinstall ok: channel://pear.php.net/Crypt_HMAC-1.0.1
sam$ sudo pear install HTTP_Requestpear.php.net" to updatedownloading HTTP_Request-1.4.0.tgz ...Starting to download HTTP_Request-1.4.0.tgz (15,262 bytes).....done: 15,262 bytesdownloading Net_URL-1.0.14.tgz ...Starting to download Net_URL-1.0.14.tgz (5,173 bytes)...done: 5,173 bytesdownloading Net_Socket-1.0.7.tgz ...Starting to download Net_Socket-1.0.7.tgz (5,419 bytes)...done: 5,419 bytesinstall ok: channel://pear.php.net/Net_URL-1.0.14install ok: channel://pear.php.net/Net_Socket-1.0.7install ok: channel://pear.php.net/HTTP_Request-1.4.0
Creating the API
At this point all you really need to do is create a function for each needed interaction with S3 (or better yet, a PHP Object with a method for each). So something like…
createBucket()
putObject()
getObject()
getBucketListing()
...
These functions are going to be creating http requests and reading http responses. Sometimes this can be a bit tricky (one missing ‘\n’ and you’re screwed), so leverage what what other have done befor you. The Amazon web services site has some good examples but in particular, I would recommend you look at ‘Test Utility for Amazon S3 in PHP‘ which does a good job of demo’ing most of the S3 functionality using PHP.
I used this code as a starting point to develop a very simple ‘Rsync’ type application for Amazon S3.
Resources
- Documentation from Amazon
- Example PHP implementation
- Who’s Using S3 here and here
- My Rsync App (Proof of Concept at this point)
Find and Replace in Mysql
PHP Data Objects
DataObject is a design pattern that incapsualtes the database with a simple to use interface. This alows the Controller to interact with the database layer in an SQL'less way. To acomplish this you build a base DataObject abstract class that implements the base functionality and sets up the interface for a Data Object. Then for each of your 'Business Objects' (such as customer), you extend the DataObject class and implement the various methods provided by DataObject. (see attached code for details).
You'll in the posted PHP files below that the code gets more 'complex' as we move down the stack (Controller->BusinessObject->DataObject). This is the beauty of OOP since we have implemented the DataObject only once in our application (it may be used by many Business Objects). As long as we maintain its interface, we are free to refactore its implementation details and add functionallity. And there is only one file to change.
Retrival of object from the database
In the simple case of the controller needing to retrieve a customer. Traditionally this might involve building a query of something like...
This process involes the Controller having far too much knowelge of the Database layer. This tight coupling should be avoided.
The same process using a database object would go something like this...
Search the database for an Object
This is quite elegant with DO's. Just simply create a new DO, set what you do know about what you are seaching for then execute the object find method.
Saving and Updating Objects
Again this is very simple also. For a populated customer DO object, just issue its insert method and to update, use the update method.

