Amazon S3 Tools, Using PHP

If you haven’t heard of Amazon S3, check it out here. It’s remote storage for your files, at $0.15 per GB per month. You sign up for an account for free, then pay at the end of each month for the GB and data transfer you used. It’s nice and cheap, and I find it better than an FTP account to backup files, for a couple reasons. 1) It’s cheap, pay only for what you use. 2) Interface is all HTTP REST making it easier to interface with in code. 3) It’s cheap 4) You can make select files public readable and available via an HTTP address 5) There is a Firefox extension, S3 Organizer, that looks like an FTP client, you can move files back and forth from your desktop. 6) It’s all the hype right now 7) It’s cheap remote backup.

The HTTP REST interface is easy to use with PHP. I made a few command line utils with PHP:
s3ls <path> – lists file details in S3 account that are prefixed with <path>
s3put <file> – stores file in S3 account in the same dir it’s in on your server
s3get <file> – retrieves file from S3 account – can use absolute path, else it assumes current working directory
s3syncdir <dir> – removes files from S3 that no longer exist on server, then uploads any missing or modified files to S3

I found an S3 class online, and am using this as my starting point. It’s the foundation for the utilities. I’ll start by listing the source code of the utils, which won’t work until you have the supporting S3.class.php and Xml.class.php files, listed below. You’ll notice how simple the utils are, they’re only a few lines of code. This is the power of OO. The S3 class makes it very easy to make the utils. Start by making a home for these files. /usr/local/s3files would be fine. Handle these pre-requisites, make the set of files, then give the utils a whirl.

PEAR modules:
pear install Crypt_HMAC
pear install HTTP_Request

php.ini
You’ll also need to edit your php.ini file and change your ‘memory_limit’ to something very high. You’ll need to set the memory limit three times higher than the max file size you intend to upload to S3. This kind of sucks, I know. When I get a chance, I’ll figure out a way to stream the files to S3, so we don’t have to pull them into memory. This is one thing that’s really bugging me about this solution, is the poor memory management. I’ll fix it sometime, when I have enough time on my hands to figure the best way to do this. Don’t let this worry you, as long as your files aren’t huge, or you have enough memory to crank your memory_limit up, you’ll be fine. I set mine at 1 GB, and assume I can upload up to 333 MB files.

Dedicated Bucket
Create a bucket in your S3 account. I suggest using S3 Organizer in Firefox. It should be a bucket you’re only going to use in conjunction with these utils on this server. If you’re going to use it on more than one server, make a different bucket for each server.

s3config.php
I suggest making this root owned and only readable by root:
chmod 600 s3config.php
Reason: You’re aws secret key is stored in this file. The utils (s3ls, s3put, s3get and s3syncdir) all require s3config, so if you make it root readable only, no one but root can execute the utils – it’ll fail. For me, this was ok, I only want root to be able to backup to my S3 account anyway.

chmod and symbolic links
Not required, but I found this useful, so it’s a tip: chmod 755 your s3ls, s3put, s3get, s3syncdir files and make sym links in your /usr/local/bin/ dir.

Now, to the source code. Pick a home for these files, edit s3config.php and give it a whirl (SORRY, my blog editor stripped indentions, ug – but hey, this is free code, right!? You can fix it :).

s3config.php

<?php

$bucket = “[some bucket name you already created]”;
$keyId = “[your aws key]”;
$secretKey = “[your aws secretKey]”;
$S3_URL = “https://s3.amazonaws.com/”;

?>

s3ls

#!/usr/bin/php -q
<?php
require_once ‘Crypt/HMAC.php’;
require_once ‘HTTP/Request.php’; // see sample code for note on bug in this package
require_once ‘S3.class.php’;
require_once ‘s3config.php’;

$dir = $argv[1];
if ( !is_readable($dir) || !is_dir($dir) )
die(“‘$dir’ is not readable or is not a directory.\nUsage: {$_SERVER[‘_’]} <path_to_dir>”);

$dir = realpath($dir);
$s3dir = ereg_replace(“^/+”, “”, $dir);

$s3 = new S3($keyId, $secretKey);
$objects = $s3->getObjects($bucket, $s3dir);
$s3files = $s3->parseObjects($objects);
print_r($objects);
print_r($s3files);

?>

s3put

#!/usr/bin/php -q
<?php
require_once ‘Crypt/HMAC.php’;
require_once ‘HTTP/Request.php’;
require_once ‘S3.class.php’;
require_once ‘s3config.php’;

$file = $argv[1];
if ( !is_readable($file) )
die(“‘$file’ is not readable.\nUsage: {$_SERVER[‘_’]} <path_to_file>”);

$file = realpath(dirname($file)).”/”.basename($file);
$s3file = ereg_replace(“^/+”, “”, $file);

$s3 = new S3($keyId, $secretKey);
$data = file_get_contents($file);
$s3->putObject ($s3file, $data, $bucket, ‘private’, ‘application/binary’);
print “Finished writing S3:$bucket/$s3file\n”;

?>

s3get

#!/usr/bin/php -q
<?php
require_once ‘Crypt/HMAC.php’;
require_once ‘HTTP/Request.php’;
require_once ‘S3.class.php’;
require_once ‘s3config.php’;

$file = $argv[1];
if ( !ereg(“^/”, $file) )
$file = getcwd() .”/”. $file;
$dir = realpath(dirname($file));
$file = $dir .”/”. basename($file);
$s3file = ereg_replace(“^[\./]+”, “”, $file);

$s3 = new S3($keyId, $secretKey);
$data = $s3->getObject ($s3file, $bucket);
file_put_contents($file, $data);
print “Wrote $file, check it, though, to be sure we really fetched it from S3\n”;

?>

 

s3syncdir

#!/usr/bin/php -q
<?php
require_once ‘Crypt/HMAC.php’;
require_once ‘HTTP/Request.php’;
require_once ‘S3.class.php’;
require_once ‘s3config.php’;
require_once ‘Xml.class.php’;

$dir = $argv[1];
if ( !is_readable($dir) || !is_dir($dir) )
die(“‘$dir’ is not readable or is not a directory.\nUsage: {$_SERVER[‘_’]} <path_to_dir>”);

$dir = realpath($dir);
$s3dir = ereg_replace(“^/+”, “”, $dir);

$s3 = new S3($keyId, $secretKey);
$objects = $s3->getObjects($bucket, $s3dir);
$s3files = $s3->parseObjects($objects);

// clean up deleted file first:
foreach ($s3files as $filename=>$attribs)
{
if ( !file_exists(“/$filename”) )
{
print “/$filename no longer exists on server, removing from s3\n”;
$s3->deleteObject ($filename, $bucket);
}
}

// write files that don’t exist on S3, or have changed
$list = `find $dir -type f`;
$localfiles = split(“\n”, trim($list));
foreach ($localfiles as $file)
{
$file = realpath(dirname($file)).”/”.basename($file);
$s3file = ereg_replace(“^/+”, “”, $file);

$doit = false;
if ( !$s3files[$s3file] )
{
print “$file exists on server, not on S3, copying…\n”;
$doit = true;
}
else if ( filesize($file) != $s3files[$s3file][‘filesize’] )
{
print “$file (“.filesize($file).”) has a different size on S3 (“.$s3files[$s3file][‘filesize’].”), copying…\n”;
$doit = true;
}
#else if ( gmdate(DATE_RFC822, filemtime($file)) != $s3files[$s3file][‘lastmodified’] )
#{
# print “$file (“.gmdate(DATE_RFC822, filemtime($file)).”) has a different date than on S3 (“. $s3files[$s3file][‘lastmodified’].”), copying…\n”;
# $doit = true;
#}
if ( $doit )
$s3->putObject ($s3file, file_get_contents($file), $bucket, ‘private’, ‘application/binary’);

}

?>

And now for the supporting files:

S3.class.php

<?php
/**
* Amazon S3 REST API Implementation
*
* This a generic PHP class that can hook-in to Amazon’s S3 Simple Storage Service
*
* Contributions and/or donations are welcome.
*
* Author: Geoffrey P. Gaudreault
* http://www.neurofuzzy.net
*
* This code is free, provided AS-IS with no warranty expressed or implied. Use at your own risk.
* If you find errors or bugs in this code, please contact me at interested@zanpo.com
* If you enhance this code in any way, please send me an update. Thank you!
*
* Version: 0.31a
* Last Updated: 9/09/2006
*
* NOTE: ENTER YOUR API ID AND SECRET KEY BELOW!!!
*
* 2/10/2008 – Modifications made by David Koopman to:
* Move the keyId and secretKey into the contructor
* Removed the set of get/set methods
* Made $objectdata a pass by reference var, since likely to be very large
* Added method, parseObjects
*
*/

// REQUIRES PEAR PACKAGE
// get with “pear install Crypt_HMAC”
require_once ‘Crypt/HMAC.php’;
require_once ‘HTTP/Request.php’;
require_once ‘Xml.class.php’;

class S3 {

// The API access point URL
var $S3_URL = “https://s3.amazonaws.com/”;

// list of valid actions (validation not implemented yet)
var $verbs = array(“GET”=>1, “DELETE”=>1, “PUT”=>1);

// set to true to echo debug info
var $_debug = false;

// —————————————–
// —————————————–
// your API key ID
var $keyId = “”; // to be set in the constructor

// your API Secret Key
var $secretKey = “”; // to be set in the constructor
// —————————————–
// —————————————–

// default action
var $_verb = “GET”;

// default ACL
var $_acl = “private”;

// default content type
#var $_contentType = “image/jpeg”;
var $_contentType = “application/binary”;

// default response content type
var $_responseContentType = “text/xml”;

// bucket object name prefix
var $prefix = “”;

// bucket list marker (useful for pagination)
var $marker = “”;

// number of keys to retrieve in a list
var $max_keys = “”;

// list delimiter
var $delimiter = “”;

// your default bucket name
var $bucketname = “modphpbackup”;

// your current object name
var $objectname = “”; // to be set later

/*
* Constructor: Amazon S3 REST API implementation
*/
function s3($keyId, $secretKey, $options = NULL) {

define(‘DATE_RFC822’, ‘D, d M Y H:i:s T’);
$this->httpDate = gmdate(DATE_RFC822);
$this->keyId = $keyId;
$this->secretKey = $secretKey;

$available_options = array(“acl”, “contentType”);

if (is_array($options)) {

foreach ($options as $key => $value) {

$this->debug_text(“Option: $key”);

if (in_array($key, $available_options) ) {

$this->debug_text(“Valid Config options: $key”);
$property = ‘_’.$key;
$this->$property = $value;
$this->debug_text(“Setting $property to $value”);

} else {

$this->debug_text(“ERROR: Config option: $key is not a valid option”);

}

}

}

$this->hasher =& new Crypt_HMAC($this->secretKey, “sha1”);
}

/*
* Method: sendRequest
* Sends the request to S3
*
* Parameters:
* resource – the name of the resource to act upon
* verb – the action to apply to the resource (GET, PUT, DELETE, HEAD)
* objectdata – the source data (body) of the resource (only applies to objects)
* acl – the access control policy for the resource
* contentType – the contentType of the resource (only applies to objects)
* metadata – any metadata you want to save in the header of the object
*/
function sendRequest ($resource, $verb = NULL, &$objectdata = NULL, $acl = NULL, $contentType = NULL, $metadata = NULL) {

if ($verb == NULL) {
$verb = $this->verb;
}

if ($acl == NULL) {
$aclstring = “”;
} else {
$aclstring = “x-amz-acl:$acl\n”;
}

$contenttypestring = “”;

if ($contentType != NULL && ($verb == “PUT”) && ($objectdata != NULL) && ($objectdata != “”)) {
$contenttypestring = “$contentType”;
}

// update date / time on each request
$this->httpDate = gmdate(DATE_RFC822);

$httpDate = $this->httpDate;

$paramstring = “”;
$delim = “?”;

if (strlen($this->prefix)) {

$paramstring .= $delim.”prefix=”.urlencode($this->prefix);
$delim = “&”;

}

if (strlen($this->marker)) {

$paramstring .= $delim.”marker=”.urlencode($this->marker);
$delim = “&”;

}

if (strlen($this->max_keys)) {

$paramstring .= $delim.”max-keys=”.$this->max_keys;
$delim = “&”;

}

if (strlen($this->delimiter)) {

$paramstring .= $delim.”delimiter=”.urlencode($this->delimiter);
$delim = “&”;

}

$this->debug_text(“HTTP Request sent to: ” . $this->S3_URL . $resource . $paramstring);

$req =& new HTTP_Request($this->S3_URL . $resource . $paramstring);
$req->setMethod($verb);

if (($objectdata != NULL) && ($objectdata != “”)) {

$contentMd5 = $this->hex2b64(md5($objectdata));
$req->addHeader(“CONTENT-MD5”, $contentMd5);
$this->debug_text(“MD5 HASH OF DATA: ” . $contentMd5);

$contentmd5string = $contentMd5;

} else {

$contentmd5string = “”;

}

if (strlen($contenttypestring)) {
$this->debug_text(“Setting content type to $contentType”);
$req->addHeader(“CONTENT-TYPE”, $contentType);
}

$req->addHeader(“DATE”, $httpDate);

if (strlen($aclstring)) {
$this->debug_text(“Setting acl string to $acl”);
$req->addHeader(“x-amz-acl”, $acl);
}

$metadatastring = “”;

if (is_array($metadata)) {

ksort($metadata);

$this->debug_text(“Metadata found.”);

foreach ($metadata as $key => $value) {

$metadatastring .= “x-amz-meta-“.$key.”:”.trim($value).”\n”;

$req->addHeader(“x-amz-meta-“.$key, trim($value));

$this->debug_text(“Setting x-amz-meta-$key to ‘$value'”);

}

}

if (($objectdata != NULL) && ($objectdata != “”)) {

$req->setBody($objectdata);

}

$stringToSign = “$verb\n$contentmd5string\n$contenttypestring\n$httpDate\n$aclstring$metadatastring/$resource”;
$this->debug_text(“Signing String: $stringToSign”);
$signature = $this->hex2b64($this->hasher->hash($stringToSign));
$req->addHeader(“Authorization”, “AWS ” . $this->keyId . “:” . $signature);

$req->sendRequest();

$this->_responseContentType = $req->getResponseHeader(“content-type”);

if (strlen($req->getResponseBody())) {

$this->debug_text($req->getResponseBody());
return $req->getResponseBody();

} else {

$this->debug_text($req->getResponseHeader());
return $req->getResponseHeader();

}

}

/*
* Method: getBuckets
* Returns a list of all buckets
*/
function getBuckets () {
return $this->sendRequest(“”,”GET”);
}

/*
* Method: getBucket
* Gets a list of all objects in the default bucket
*/
function getBucket ($bucketname = NULL) {

if ($bucketname == NULL) {

return $this->sendRequest($this->bucketname,”GET”);

} else {

return $this->sendRequest($bucketname,”GET”);

}

}

/*
* Method: getObjects
* Gets a list of all objects in the specified bucket
*
* Parameters:
* prefix – (optional) Limits the response to keys which begin with the indicated prefix. You can use prefixes to separate a bucket into different sets of keys in a way similar to how a file system uses folders.
* marker – (optional) Indicates where in the bucket to begin listing. The list will only include keys that occur lexicographically after marker. This is convenient for pagination: To get the next page of results use the last key of the current page as the marker.
* max-keys – (optional) The maximum number of keys you’d like to see in the response body. The server may return fewer than this many keys, but will not return more.
*/
function getObjects ($bucketname, $prefix = NULL, $marker = NULL, $max_keys = NULL, $delimiter = NULL) {

if ($prefix != NULL) {

$this->prefix = $prefix;

} else {

$this->prefix = “”;

}

if ($marker != NULL) {

$this->marker = $marker;

} else {

$this->marker = “”;

}

if ($max_keys != NULL) {

$this->max_keys = $max_keys;

} else {

$this->max_keys = “”;

}

if ($delimiter != NULL) {

$this->delimiter = $delimiter;

} else {

$this->delimiter = “”;

}

if ($bucketname != NULL) {

return $this->sendRequest($bucketname,”GET”);

} else {

return false;

}

}

/*
* Method: getObjectInfo
* Get header information about the object. The HEAD operation is used to retrieve information about a specific object,
* without actually fetching the object itself
*
* Parameters:
* objectname – The name of the object to get information about
* bucketname – (optional) the name of the bucket containing the object. If none is supplied, the default bucket is used
*/
function getObjectInfo ($objectname, $bucketname = NULL) {
if ($bucketname == NULL) {
$bucketname = $this->bucketname;
}
return $this->sendRequest($bucketname.”/”.$objectname,”HEAD”);
}

/*
* Method: getObject
* Gets an object from S3
*
* Parameters:
* objectname – the name of the object to get
* bucketname – (optional) the name of the bucket containing the object. If none is supplied, the default bucket is used
*/
function getObject ($objectname, $bucketname = NULL) {
if ($bucketname == NULL) {
$bucketname = $this->bucketname;
}
return $this->sendRequest($bucketname.”/”.$objectname,”GET”);
}

/*
* Method: putBucket
* Creates a new bucket in S3
*
* Parameters:
* bucketname – the name of the bucket. It must be unique. No other S3 users may have this bucket name
*/
function putBucket ($bucketname) {
return $this->sendRequest($bucketname,”PUT”);
}

/*
* Method: putObject
* Puts an object into S3
*
* Parameters:
* objectname – the name of the object to put
* objectdata – the source data (body) of the resource (only applies to objects)
* bucketname – (optional) the name of the bucket containing the object. If none is supplied, the default bucket is used
* acl – the access control policy for the resource
* contentType – the contentType of the resource (only applies to objects)
* metadata – any metadata you want to save in the header of the object
*/
function putObject ($objectname, &$objectdata, $bucketname = NULL, $acl = NULL, $contentType = NULL, $metadata = NULL) {

if ($bucketname == NULL) {
$bucketname = $this->bucketname;
}

if ($acl == NULL || $acl == “”) {
$acl = $this->_acl;
}

if ($contentType == NULL || $contentType == “”) {
$contentType = $this->_contentType;
}

if ($objectdata != NULL) {
return $this->sendRequest($bucketname.”/”.$objectname, “PUT”, $objectdata, $acl, $contentType, $metadata);
} else {
return false;
}

}

/*
* Method: deleteBucket
* Deletes bucket in S3. The bucket name will fall into the public domain.
*/
function deleteBucket ($bucketname) {
return $this->sendRequest($bucketname, “DELETE”);
}

/*
* Method: deleteObject
* Deletes an object from S3
*
* Parameters:
* objectname – the name of the object to delete
* bucketname – (optional) the name of the bucket containing the object. If none is supplied, the default bucket is used
*/
function deleteObject ($objectname, $bucketname = NULL) {

if ($bucketname == NULL) {

$bucketname = $this->bucketname;

}

return $this->sendRequest($bucketname.”/”.$objectname, “DELETE”);

}

/*
* Method: hex2b64
* Utility function for constructing signatures
*/
function hex2b64($str) {

$raw = ”;
for ($i=0; $i < strlen($str); $i+=2) {
$raw .= chr(hexdec(substr($str, $i, 2)));
}
return base64_encode($raw);

}

/*
* Method: debug_text
* Echoes debug information to the browser. Set this->debug to false for production use
*/
function debug_text($text) {

if ($this->_debug) {
echo(“<br>\n”);
print_r($text);
echo(“<br><br>\n\n”);
}

return true;

}

function parseObjects( $objects )
{
$x = new Xml();
$x->parse($objects);

$s3files = array();
for($i=0; $i< count($x->structure); $i++)
{
$item = $x->structure[$i];
if ( $item[‘tag’] == ‘KEY’ )
{
$filename = $item[‘data’];
$lastmodified = null;
$filesize = null;
$found = 0;
while ( $i < count($x->structure) && $found < 2)
{
$i++;
$item = $x->structure[$i];
if ( $item[‘tag’] == ‘LASTMODIFIED’ )
{
$lastmodified = $item[‘data’];
$found++;
}
else if ( $item[‘tag’] == ‘SIZE’ )
{
$filesize = $item[‘data’];
$found++;
}
}
$s3files[$filename] = array(‘filesize’=>$filesize, ‘lastmodified’=>$lastmodified);
}
}
return $s3files;
}

}

?>

Xml.class.php

<?php
class Xml {
var $parser;
var $structure;
var $currentTag;
var $currentAttributes;

function xml()
{
$this->parser = xml_parser_create();

xml_set_object($this->parser, $this);
xml_set_element_handler($this->parser, “tag_open”, “tag_close”);
xml_set_character_data_handler($this->parser, “cdata”);
}

function parse($data)
{
xml_parse($this->parser, $data);
}

function tag_open($parser, $tag, $attributes)
{
#var_dump($parser, $tag, $attributes);
$this->currentTag = $tag;
$this->currentAttributes = $attributes;
}

function cdata($parser, $cdata)
{
$this->structure[] = array(‘tag’=>$this->currentTag, ‘attributes’=>$this->currentAttributes, ‘data’ => $cdata);
#var_dump($parser, $cdata);
#print “\n——————–\n”;
#print_r($parser);
#print “\n–\n”;
#print_r($cdata);
}

function tag_close($parser, $tag)
{
#var_dump($parser, $tag);
$this->currentTag = null;
$this->currentAttributes = null;
}

} // end of class xml

?>

That’s it for now. I’ll follow this up with modifications. As of today, Feb 9, 2008, this is version 0.1

DaveK

Comments are closed.