NAME¶
S3 - Amazon S3 Web Service Interface
SYNOPSIS¶
package require
Tcl 8.5
package require
sha1 1.0
package require
md5 2.0
package require
base64 2.3
package require
xsxp 1.0
S3::Configure ?
-reset boolean? ?
-retries
integer? ?
-accesskeyid idstring?
?
-secretaccesskey idstring? ?
-service-access-point
FQDN? ?
-use-tls boolean? ?
-default-compare
always|never|exists|missing|newer|date|checksum|different? ?
-default-separator string? ?
-default-acl
private|public-read|public-read-write|authenticated-read|keep|calc? ?
-default-bucket bucketname?
S3::SuggestBucket ?
name?
S3::REST dict
S3::ListAllMyBuckets ?
-blocking boolean? ?
-parse-xml
xmlstring? ?
-result-type REST|xml|pxml|dict|names|owner?
S3::PutBucket ?
-bucket bucketname? ?
-blocking
boolean? ?
-acl
{}|private|public-read|public-read-write|authenticated-read?
S3::DeleteBucket ?
-bucket bucketname? ?
-blocking
boolean?
S3::GetBucket ?
-bucket bucketname? ?
-blocking
boolean? ?
-parse-xml xmlstring? ?
-max-count
integer? ?
-prefix prefixstring? ?
-delimiter
delimiterstring? ?
-result-type REST|xml|pxml|names|dict?
S3::Put ?
-bucket bucketname?
-resource
resourcename ?
-blocking boolean? ?
-file
filename? ?
-content contentstring? ?
-acl
private|public-read|public-read-write|authenticated-read|calc|keep? ?
-content-type contenttypestring? ?
-x-amz-meta-*
metadatatext? ?
-compare comparemode?
S3::Get ?
-bucket bucketname?
-resource
resourcename ?
-blocking boolean? ?
-compare
comparemode? ?
-file filename? ?
-content
contentvarname? ?
-timestamp aws|now? ?
-headers
headervarname?
S3::Head ?
-bucket bucketname?
-resource
resourcename ?
-blocking boolean? ?
-dict
dictvarname? ?
-headers headersvarname? ?
-status
statusvarname?
S3::GetAcl ?
-blocking boolean? ?
-bucket
bucketname?
-resource resourcename ?
-result-type
REST|xml|pxml?
S3::PutAcl ?
-blocking boolean? ?
-bucket
bucketname?
-resource resourcename ?
-acl
new-acl?
S3::Delete ?
-bucket bucketname?
-resource
resourcename ?
-blocking boolean? ?
-status
statusvar?
S3::Push ?
-bucket bucketname?
-directory
directoryname ?
-prefix prefixstring? ?
-compare
comparemode? ?
-x-amz-meta-* metastring? ?
-acl
aclcode? ?
-delete boolean? ?
-error
throw|break|continue? ?
-progress scriptprefix?
S3::Pull ?
-bucket bucketname?
-directory
directoryname ?
-prefix prefixstring? ?
-blocking
boolean? ?
-compare comparemode? ?
-delete
boolean? ?
-timestamp aws|now? ?
-error
throw|break|continue? ?
-progress scriptprefix?
S3::Toss ?
-bucket bucketname?
-prefix
prefixstring ?
-blocking boolean? ?
-error
throw|break|continue? ?
-progress scriptprefix?
DESCRIPTION¶
This package provides access to Amazon's Simple Storage Solution web service.
As a quick summary, Amazon Simple Storage Solution provides a for-fee web
service allowing the storage of arbitrary data as "resources" within
"buckets" online. See
http://www.amazonaws.com/ for details
on that system. Access to the service is via HTTP (SOAP or REST). Much of this
documentation will not make sense if you're not familiar with the terms and
functionality of the Amazon S3 service.
This package provides services for reading and writing the data items via the
REST interface. It also provides some higher-level operations. Other packages
in the same distribution provide for even more functionality.
Copyright 2006 Darren New. All Rights Reserved. NO WARRANTIES OF ANY TYPE ARE
PROVIDED. COPYING OR USE INDEMNIFIES THE AUTHOR IN ALL WAYS. This software is
licensed under essentially the same terms as Tcl. See LICENSE.txt for the
terms.
ERROR REPORTING¶
The error reporting from this package makes use of $errorCode to provide more
details on what happened than simply throwing an error. Any error caught by
the S3 package (and we try to catch them all) will return with an $errorCode
being a list having at least three elements. In all cases, the first element
will be "S3". The second element will take on one of six values,
with that element defining the value of the third and subsequent elements.
S3::REST does not throw an error, but rather returns a dictionary with the
keys "error", "errorInfo", and "errorCode" set.
This allows for reliable background use. The possible second elements are
these:
- usage
- The usage of the package is incorrect. For example, a
command has been invoked which requires the library to be configured
before the library has been configured, or an invalid combination of
options has been specified. The third element of $errorCode supplies the
name of the parameter that was wrong. The fourth usually provides the
arguments that were actually supplied to the throwing proc, unless the
usage error isn't confined to a single proc.
- local
- Something happened on the local system which threw an
error. For example, a request to upload or download a file was made and
the file permissions denied that sort of access. The third element of
$errorCode is the original $errorCode.
- socket
- Something happened with the socket. It closed prematurely,
or some other condition of failure-to-communicate-with-Amazon was
detected. The third element of $errorCode is the original $errorCode, or
sometimes the message from fcopy, or ...?
- remote
- The Amazon web service returned an error code outside the
2xx range in the HTTP header. In other words, everything went as
documented, except this particular case was documented not to work. The
third element is the dictionary returned from ::S3::REST. Note that
S3::REST itself never throws this error, but just returns the dictionary.
Most of the higher-level commands throw for convenience, unless an
argument indicates they should not. If something is documented as
"not throwing an S3 remote error", it means a status return is
set rather than throwing an error if Amazon returns a non-2XX HTTP result
code.
- notyet
- The user obeyed the documentation, but the author has not
yet gotten around to implementing this feature. (Right now, only TLS
support and sophisticated permissions fall into this category, as well as
the S3::Acl command.)
- xml
- The service has returned invalid XML, or XML whose schema
is unexpected. For the high-level commands that accept service XML as
input for parsing, this may also be thrown.
COMMANDS¶
This package provides several separate levels of complexity.
- •
- The lowest level simply takes arguments to be sent to the
service, sends them, retrieves the result, and provides it to the caller.
Note: This layer allows both synchronous and event-driven
processing. It depends on the MD5 and SHA1 and base64 packages from Tcllib
(available at http://tcllib.sourceforge.net/). Note that
S3::Configure is required for S3::REST to work due to the
authentication portion, so we put that in the "lowest
level."
- •
- The next layer parses the results of calls, allowing for
functionality such as uploading only changed files, synchronizing
directories, and so on. This layer depends on the TclXML package as
well as the included xsxp package. These packages are package
required when these more-sophisticated routines are called, so nothing
breaks if they are not correctly installed.
- •
- Also included is a separate program that uses the library.
It provides code to parse $argv0 and $argv from the command line, allowing
invocation as a tclkit, etc. (Not yet implmented.)
- •
- Another separate program provides a GUI interface allowing
drag-and-drop and other such functionality. (Not yet implemented.)
- •
- Also built on this package is the OddJob program. It is a
separate program designed to allow distribution of computational work
units over Amazon's Elastic Compute Cloud web service.
The goal is to have at least the bottom-most layers implemented in pure Tcl
using only that which comes from widely-available sources, such as Tcllib.
LOW LEVEL COMMANDS¶
These commands do not require any packages not listed above. They talk directly
to the service, or they are utility or configuration routines. Note that the
"xsxp" package was written to support this package, so it should be
available wherever you got this package.
- S3::Configure ?-reset boolean?
?-retries integer? ?-accesskeyid idstring?
?-secretaccesskey idstring? ?-service-access-point
FQDN? ? -use-tls boolean? ?-default-compare
always|never|exists|missing|newer|date|checksum|different? ?
-default-separator string? ?-default-acl
private|public-read|public-read-write|authenticated-read|keep|calc? ?
-default-bucket bucketname?
- There is one command for configuration, and that is
S3::Configure. If called with no arguments, it returns a dictionary
of key/value pairs listing all current settings. If called with one
argument, it returns the value of that single argument. If called with two
or more arguments, it must be called with pairs of arguments, and it
applies the changes in order. There is only one set of configuration
information per interpreter.
The following options are accepted:
- -reset boolean
- By default, false. If true, any previous changes and any
changes on the same call before the reset option will be returned to
default values.
- -retries integer
- Default value is 3. If Amazon returns a 500 error, a retry
after an exponential backoff delay will be tried this many times before
finally throwing the 500 error. This applies to each call to
S3::REST from the higher-level commands, but not to S3::REST
itself. That is, S3::REST will always return httpstatus 500 if
that's what it receives. Functions like S3::Put will retry the PUT
call, and will also retry the GET and HEAD calls used to do content
comparison. Changing this to 0 will prevent retries and their associated
delays. In addition, socket errors (i.e., errors whose errorCode starts
with "S3 socket") will be similarly retried after backoffs.
- -accesskeyid idstring
- -secretaccesskey idstring
- Each defaults to an empty string. These must be set before
any calls are made. This is your S3 ID. Once you sign up for an account,
go to http://www.amazonaws.com/, sign in, go to the "Your Web
Services Account" button, pick "AWS Access Identifiers",
and your access key ID and secret access keys will be available. All
S3::REST calls are authenticated. Blame Amazon for the poor choice
of names.
- -service-access-point FQDN
- Defaults to "s3.amazonaws.com". This is the
fully-qualified domain name of the server to contact for S3::REST
calls. You should probably never need to touch this, unless someone else
implements a compatible service, or you wish to test something by pointing
the library at your own service.
- -slop-seconds integer
- When comparing dates between Amazon and the local machine,
two dates within this many seconds of each other are considered the same.
Useful for clock drift correction, processing overhead time, and so
on.
- -use-tls boolean
- Defaults to false. This is not yet implemented. If true,
S3::REST will negotiate a TLS connection to Amazon. If false,
unencrypted connections are used.
- -bucket-prefix string
- Defaults to "TclS3". This string is used by
S3::SuggestBucketName if that command is passed an empty string as
an argument. It is used to distinguish different applications using the
Amazon service. Your application should always set this to keep from
interfering with the buckets of other users of Amazon S3 or with other
buckets of the same user.
- -default-compare
always|never|exists|missing|newer|date|checksum|different
- Defaults to "always." If no -compare is specified
on S3::Put, S3::Get, or S3::Delete, this comparison
is used. See those commands for a description of the meaning.
- -default-separator string
- Defaults to "/". This is currently unused. It
might make sense to use this for S3::Push and S3::Pull, but
allowing resources to have slashes in their names that aren't marking
directories would be problematic. Hence, this currently does nothing.
- -default-acl
private|public-read|public-read-write|authenticated-read|keep|calc
- Defaults to an empty string. If no -acl argument is
provided to S3::Put or S3::Push, this string is used (given
as the x-amz-acl header if not keep or calc). If this is also empty, no
x-amz-acl header is generated. This is not used by
S3::REST.
- -default-bucket bucketname
- If no bucket is given to S3::GetBucket,
S3::PutBucket, S3::Get, S3::Put, S3::Head,
S3::Acl, S3::Delete, S3::Push, S3::Pull, or
S3::Toss, and if this configuration variable is not an empty string
(and not simply "/"), then this value will be used for the
bucket. This is useful if one program does a large amount of resource
manipulation within a single bucket.
- S3::SuggestBucket ?name?
- The S3::SuggestBucket command accepts an optional
string as a prefix and returns a valid bucket containing the name
argument and the Access Key ID. This makes the name unique to the owner
and to the application (assuming the application picks a good name
argument). If no name is provided, the name from S3::Configure
-bucket-prefix is used. If that too is empty (which is not the
default), an error is thrown.
- S3::REST dict
- The S3::REST command takes as an argument a
dictionary and returns a dictionary. The return dictionary has the same
keys as the input dictionary, and includes additional keys as the result.
The presence or absence of keys in the input dictionary can control the
behavior of the routine. It never throws an error directly, but includes
keys "error", "errorInfo", and "errorCode"
if necessary. Some keys are required, some optional. The routine can run
either in blocking or non-blocking mode, based on the presense of
resultvar in the input dictionary. This requires the
-accesskeyid and -secretaccesskey to be configured via
S3::Configure before being called.
The possible input keys are these:
- verb GET|PUT|DELETE|HEAD
- This required item indicates the verb to be used.
- resource string
- This required item indicates the resource to be accessed. A
leading / is added if not there already. It will be URL-encoded for you if
necessary. Do not supply a resource name that is already URL-encoded.
- ?rtype torrent|acl?
- This indicates a torrent or acl resource is being
manipulated. Do not include this in the resource key, or the
"?" separator will get URL-encoded.
- ?parameters dict?
- This optional dictionary provides parameters added to the
URL for the transaction. The keys must be in the correct case (which is
confusing in the Amazon documentation) and the values must be valid. This
can be an empty dictionary or omitted entirely if no parameters are
desired. No other error checking on parameters is performed.
- ?headers dict?
- This optional dictionary provides headers to be added to
the HTTP request. The keys must be in lower case for the
authentication to work. The values must not contain embedded newlines or
carriage returns. This is primarily useful for adding x-amz-* headers.
Since authentication is calculated by S3::REST, do not add that
header here. Since content-type gets its own key, also do not add that
header here.
- ?inbody contentstring?
- This optional item, if provided, gives the content that
will be sent. It is sent with a tranfer encoding of binary, and only the
low bytes are used, so use [encoding convertto utf-8] if the string is a
utf-8 string. This is written all in one blast, so if you are using
non-blocking mode and the inbody is especially large, you may wind
up blocking on the write socket.
- ?infile filename?
- This optional item, if provided, and if inbody is
not provided, names the file from which the body of the HTTP message will
be constructed. The file is opened for reading and sent progressively by
[fcopy], so it should not block in non-blocking mode even if the file is
very large. The file is transfered in binary mode, so the bytes on your
disk will match the bytes in your resource. Due to HTTP restrictions, it
must be possible to use [file size] on this file to determine the size at
the start of the transaction.
- ?S3chan channel?
- This optional item, if provided, indicates the already-open
socket over which the transaction should be conducted. If not provided, a
connection is made to the service access point specified via
S3::Configure, which is normally s3.amazonaws.com. If this is
provided, the channel is not closed at the end of the transaction.
- ?outchan channel?
- This optional item, if provided, indicates the already-open
channel to which the body returned from S3 should be written. That is, to
retrieve a large resource, open a file, set the translation mode, and pass
the channel as the value of the key outchan. Output will be written to the
channel in pieces so memory does not fill up unnecessarily. The channel is
not closed at the end of the transaction.
- ?resultvar varname?
- This optional item, if provided, indicates that
S3::REST should run in non-blocking mode. The varname should
be fully qualified with respect to namespaces and cannot be local to a
proc. If provided, the result of the S3::REST call is assigned to
this variable once everything has completed; use trace or vwait to know
when this has happened. If this key is not provided, the result is simply
returned from the call to S3::REST and no calls to the eventloop
are invoked from within this call.
- ?throwsocket throw|return?
- This optional item, if provided, indicates that
S3::REST should throw an error if throwmode is throw and a socket
error is encountered. It indicates that S3::REST should return the
error code in the returned dictionary if a socket error is encountered and
this is set to return. If throwsocket is set to return or if
the call is not blocking, then a socket error (i.e., an error whose error
code starts with "S3 socket" will be returned in the dictionary
as error, errorInfo, and errorCode. If a foreground
call is made (i.e., resultvar is not provided), and this option is
not provided or is set to throw, then error will be invoked
instead.
Once the call to
S3::REST completes, a new dict is returned, either in
the
resultvar or as the result of execution. This dict is a copy of the
original dict with the results added as new keys. The possible new keys are
these:
- error errorstring
- errorInfo errorstring
- errorCode errorstring
- If an error is caught, these three keys will be set in the
result. Note that S3::REST does not consider a non-2XX HTTP
return code as an error. The errorCode value will be formatted
according to the ERROR REPORTING description. If these are present,
other keys described here might not be.
- httpstatus threedigits
- The three-digit code from the HTTP transaction. 2XX for
good, 5XX for server error, etc.
- httpmessage text
- The textual result after the status code. "OK" or
"Forbidden" or etc.
- outbody contentstring
- If outchan was not specified, this key will hold a
reference to the (unencoded) contents of the body returned. If Amazon
returned an error (a la the httpstatus not a 2XX value), the error message
will be in outbody or written to outchan as
appropriate.
- outheaders dict
- This contains a dictionary of headers returned by Amazon.
The keys are always lower case. It's mainly useful for finding the
x-amz-meta-* headers, if any, although things like last-modified and
content-type are also useful. The keys of this dictionary are always lower
case. Both keys and values are trimmed of extraneous whitespace.
HIGH LEVEL COMMANDS¶
The routines in this section all make use of one or more calls to
S3::REST to do their work, then parse and manage the data in a
convenient way. All these commands throw errors as described in
ERROR
REPORTING unless otherwise noted.
In all these commands, all arguments are presented as name/value pairs, in any
order. All the argument names start with a hyphen.
There are a few options that are common to many of the commands, and those
common options are documented here.
- -blocking boolean
- If provided and specified as false, then any calls to
S3:REST will be non-blocking, and internally these routines will
call [vwait] to get the results. In other words, these routines will
return the same value, but they'll have event loops running while waiting
for Amazon.
- -parse-xml xmlstring
- If provided, the routine skips actually communicating with
Amazon, and instead behaves as if the XML string provided was returned as
the body of the call. Since several of these routines allow the return of
data in various formats, this argument can be used to parse existing XML
to extract the bits of information that are needed. It's also helpful for
testing.
- -bucket bucketname
- Almost every high-level command needs to know what bucket
the resources are in. This option specifies that. (Only the command to
list available buckets does not require this parameter.) This does not
need to be URL-encoded, even if it contains special or non-ASCII
characters. May or may not contain leading or trailing spaces - commands
normalize the bucket. If this is not supplied, the value is taken from
S3::Configure -default-bucket if that string isn't empty. Note that
spaces and slashes are always trimmed from both ends and the rest must
leave a valid bucket.
- -resource resourcename
- This specifies the resource of interest within the bucket.
It may or may not start with a slash - both cases are handled. This does
not need to be URL-encoded, even if it contains special or non-ASCII
characters.
- -compare
always|never|exists|missing|newer|date|checksum|different
- When commands copy resources to files or files to
resources, the caller may specify that the copy should be skipped if the
contents are the same. This argument specifies the conditions under which
the files should be copied. If it is not passed, the result of
S3::Configure -default-compare is used, which in turn defaults to
"always." The meanings of the various values are these:
- always
- Always copy the data. This is the default.
- never
- Never copy the data. This is essentially a no-op, except in
S3::Push and S3::Pull where the -delete flag might make a
difference.
- exists
- Copy the data only if the destination already exists.
- missing
- Copy the data only if the destination does not already
exist.
- newer
- Copy the data if the destination is missing, or if the date
on the source is newer than the date on the destination by at least
S3::Configure -slop-seconds seconds. If the source is Amazon, the
date is taken from the Last-Modified header. If the source is local, it is
taken as the mtime of the file. If the source data is specified in a
string rather than a file, it is taken as right now, via [clock
seconds].
- date
- Like newer, except copy if the date is newer
or older.
- checksum
- Calculate the MD5 checksum on the local file or string, ask
Amazon for the eTag of the resource, and copy the data if they're
different. Copy the data also if the destination is missing. Note that
this can be slow with large local files unless the C version of the MD5
support is available.
- different
- Copy the data if the destination does not exist. If the
destination exists and an actual file name was specified (rather than a
content string), and the date on the file differs from the date on the
resource, copy the data. If the data is provided as a content string, the
"date" is treated as "right now", so it will likely
always differ unless slop-seconds is large. If the dates are the same, the
MD5 checksums are compared, and the data is copied if the checksums
differ.
Note that "newer" and "date" don't care about the contents,
and "checksum" doesn't care about the dates, but
"different" checks both.
- S3::ListAllMyBuckets ?-blocking
boolean? ? -parse-xml xmlstring? ?-result-type
REST|xml|pxml|dict|names|owner?
- This routine performs a GET on the Amazon S3 service, which
is defined to return a list of buckets owned by the account identified by
the authorization header. (Blame Amazon for the dumb names.)
- -blocking boolean
- See above for standard definition.
- -parse-xml xmlstring
- See above for standard definition.
- -result-type REST
- The dictionary returned by S3::REST is the return
value of S3::ListAllMyBuckets. In this case, a non-2XX httpstatus
will not throw an error. You may not combine this with
-parse-xml.
- -result-type xml
- The raw XML of the body is returned as the result (with no
encoding applied).
- -result-type pxml
- The XML of the body as parsed by xsxp::parse is
returned.
- -result-type dict
- A dictionary of interesting portions of the XML is
returned. The dictionary contains the following keys:
- Owner/ID
- The Amazon AWS ID (in hex) of the owner of the bucket.
- Owner/DisplayName
- The Amazon AWS ID's Display Name.
- Bucket/Name
- A list of names, one for each bucket.
- Bucket/CreationDate
- A list of dates, one for each bucket, in the same order as
Bucket/Name, in ISO format (as returned by Amazon).
- -result-type names
- A list of bucket names is returned with all other
information stripped out. This is the default result type for this
command.
- -result-type owner
- A list containing two elements is returned. The first
element is the owner's ID, and the second is the owner's display
name.
- S3::PutBucket ?-bucket bucketname?
?-blocking boolean? ?-acl
{}|private|public-read|public-read-write|authenticated-read?
- This command creates a bucket if it does not already exist.
Bucket names are globally unique, so you may get a "Forbidden"
error from Amazon even if you cannot see the bucket in
S3::ListAllMyBuckets. See S3::SuggestBucket for ways to
minimize this risk. The x-amz-acl header comes from the -acl
option, or from S3::Configure -default-acl if not specified.
- S3::DeleteBucket ?-bucket bucketname?
? -blocking boolean?
- This command deletes a bucket if it is empty and you have
such permission. Note that Amazon's list of buckets is a global resource,
requiring far-flung synchronization. If you delete a bucket, it may be
quite a few minutes (or hours) before you can recreate it, yielding
"Conflict" errors until then.
- S3::GetBucket ?-bucket bucketname?
?-blocking boolean? ?-parse-xml xmlstring?
?-max-count integer? ?-prefix prefixstring?
?-delimiter delimiterstring? ?-result-type
REST|xml|pxml|names|dict?
- This lists the contents of a bucket. That is, it returns a
directory listing of resources within a bucket, rather than transfering
any user data.
- -bucket bucketname
- The standard bucket argument.
- -blocking boolean
- The standard blocking argument.
- -parse-xml xmlstring
- The standard parse-xml argument.
- -max-count integer
- If supplied, this is the most number of records to be
returned. If not supplied, the code will iterate until all records have
been found. Not compatible with -parse-xml. Note that if this is supplied,
only one call to S3::REST will be made. Otherwise, enough calls
will be made to exhaust the listing, buffering results in memory, so take
care if you may have huge buckets.
- -prefix prefixstring
- If present, restricts listing to resources with a
particular prefix. One leading / is stripped if present.
- -delimiter delimiterstring
- If present, specifies a delimiter for the listing. The
presence of this will summarize multiple resources into one entry, as if
S3 supported directories. See the Amazon documentation for details.
- -result-type REST|xml|pxml|names|dict
- This indicates the format of the return result of the
command.
- REST
- If -max-count is specified, the dictionary returned
from S3::REST is returned. If -max-count is not specified, a
list of all the dictionaries returned from the one or more calls to
S3::REST is returned.
- xml
- If -max-count is specified, the body returned from
S3::REST is returned. If -max-count is not specified, a list
of all the bodies returned from the one or more calls to S3::REST
is returned.
- pxml
- If -max-count is specified, the body returned from
S3::REST is passed throught xsxp::parse and then returned.
If -max-count is not specified, a list of all the bodies returned
from the one or more calls to S3::REST are each passed through
xsxp::parse and then returned.
- names
- Returns a list of all names found in either the
Contents/Key fields or the CommonPrefixes/Prefix fields. If no
-delimiter is specified and no -max-count is specified, this
returns a list of all resources with the specified -prefix.
- dict
- Returns a dictionary. (Returns only one dictionary even if
-max-count wasn't specified.) The keys of the dictionary are as
follows:
- Name
- The name of the bucket (from the final call to
S3::REST).
- Prefix
- From the final call to S3::REST.
- Marker
- From the final call to S3::REST.
- MaxKeys
- From the final call to S3::REST.
- IsTruncated
- From the final call to S3::REST, so always false if
-max-count is not specified.
- NextMarker
- Always provided if IsTruncated is true, and calculated of
Amazon does not provide it. May be empty if IsTruncated is false.
- Key
- A list of names of resources in the bucket matching the
-prefix and -delimiter restrictions.
- LastModified
- A list of times of resources in the bucket, in the same
order as Key, in the format returned by Amazon. (I.e., it is not parsed
into a seconds-from-epoch.)
- ETag
- A list of entity tags (a.k.a. MD5 checksums) in the same
order as Key.
- Size
- A list of sizes in bytes of the resources, in the same
order as Key.
- Owner/ID
- A list of owners of the resources in the bucket, in the
same order as Key.
- Owner/DisplayName
- A list of owners of the resources in the bucket, in the
same order as Key. These are the display names.
- CommonPrefixes/Prefix
- A list of prefixes common to multiple entities. This is
present only if -delimiter was supplied.
- S3::Put ?-bucket bucketname?
-resource resourcename ?-blocking boolean?
?-file filename? ?-content contentstring?
?-acl
private|public-read|public-read-write|authenticated-read|calc|keep? ?
-content-type contenttypestring? ?-x-amz-meta-*
metadatatext? ? -compare comparemode?
- This command sends data to a resource on Amazon's servers
for storage, using the HTTP PUT command. It returns 0 if the
-compare mode prevented the transfer, 1 if the transfer worked, or
throws an error if the transfer was attempted but failed. Server 5XX
errors and S3 socket errors are retried according to S3:Configure
-retries settings before throwing an error; other errors throw
immediately.
- -bucket
- This specifies the bucket into which the resource will be
written. Leading and/or trailing slashes are removed for you, as are
spaces.
- -resource
- This is the full name of the resource within the bucket. A
single leading slash is removed, but not a trailing slash. Spaces are not
trimmed.
- -blocking
- The standard blocking flag.
- -file
- If this is specified, the filename must exist, must
be readable, and must not be a special or directory file. [file size] must
apply to it and must not change for the lifetime of the call. The default
content-type is calculated based on the name and/or contents of the file.
Specifying this is an error if -content is also specified, but at
least one of -file or -content must be specified. (The file
is allowed to not exist or not be readable if -compare never
is specified.)
- -content
- If this is specified, the contentstring is sent as
the body of the resource. The content-type defaults to
"application/octet-string". Only the low bytes are sent, so
non-ASCII should use the appropriate encoding (such as [encoding convertto
utf-8]) before passing it to this routine, if necessary. Specifying this
is an error if -file is also specified, but at least one of
-file or -content must be specified.
- -acl
- This defaults to S3::Configure -default-acl if not
specified. It sets the x-amz-acl header on the PUT operation. If the value
provided is calc, the x-amz-acl header is calculated based on the
I/O permissions of the file to be uploaded; it is an error to specify
calc and -content. If the value provided is keep, the
acl of the resource is read before the PUT (or the default is used if the
resource does not exist), then set back to what it was after the PUT (if
it existed). An error will occur if the resource is successfully written
but the kept ACL cannot be then applied. This should never happen.
Note: calc is not currently fully implemented.
- -x-amz-meta-*
- If any header starts with "-x-amz-meta-", its
contents are added to the PUT command to be stored as metadata with the
resource. Again, no encoding is performed, and the metadata should not
contain characters like newlines, carriage returns, and so on. It is best
to stick with simple ASCII strings, or to fix the library in several
places.
- -content-type
- This overrides the content-type calculated by -file
or sets the content-type for -content.
- -compare
- This is the standard compare mode argument. S3::Put
returns 1 if the data was copied or 0 if the data was skipped due to the
comparison mode so indicating it should be skipped.
- S3::Get ?-bucket bucketname?
-resource resourcename ?-blocking boolean?
?-compare comparemode? ?-file filename?
?-content contentvarname? ?-timestamp aws|now?
?-headers headervarname?
- This command retrieves data from a resource on Amazon's S3
servers, using the HTTP GET command. It returns 0 if the -compare
mode prevented the transfer, 1 if the transfer worked, or throws an error
if the transfer was attempted but failed. Server 5XX errors and S3 socket
errors are are retried according to S3:Configure settings before
throwing an error; other errors throw immediately. Note that this is
always authenticated as the user configured in via S3::Configure
-accesskeyid. Use the Tcllib http for unauthenticated GETs.
- -bucket
- This specifies the bucket from which the resource will be
read. Leading and/or trailing slashes are removed for you, as are
spaces.
- -resource
- This is the full name of the resource within the bucket. A
single leading slash is removed, but not a trailing slash. Spaces are not
trimmed.
- -blocking
- The standard blocking flag.
- -file
- If this is specified, the body of the resource will be read
into this file, incrementally without pulling it entirely into memory
first. The parent directory must already exist. If the file already
exists, it must be writable. If an error is thrown part-way through the
process and the file already existed, it may be clobbered. If an error is
thrown part-way through the process and the file did not already exist,
any partial bits will be deleted. Specifying this is an error if
-content is also specified, but at least one of -file or
-content must be specified.
- -timestamp
- This is only valid in conjunction with -file. It may
be specified as now or aws. The default is now. If
now, the file's modification date is left up to the system. If
aws, the file's mtime is set to match the Last-Modified header on
the resource, synchronizing the two appropriately for -compare
date or -compare newer.
- -content
- If this is specified, the contentvarname is a
variable in the caller's scope (not necessarily global) that receives the
value of the body of the resource. No encoding is done, so if the resource
(for example) represents a UTF-8 byte sequence, use [encoding convertfrom
utf-8] to get a valid UTF-8 string. If this is specified, the
-compare is ignored unless it is never, in which case no
assignment to contentvarname is performed. Specifying this is an
error if -file is also specified, but at least one of -file
or -content must be specified.
- -compare
- This is the standard compare mode argument. S3::Get
returns 1 if the data was copied or 0 if the data was skipped due to the
comparison mode so indicating it should be skipped.
- -headers
- If this is specified, the headers resulting from the fetch
are stored in the provided variable, as a dictionary. This will include
content-type and x-amz-meta-* headers, as well as the usual HTTP headers,
the x-amz-id debugging headers, and so on. If no file is fetched (due to
-compare or other errors), no assignment to this variable is
performed.
- S3::Head ?-bucket bucketname?
-resource resourcename ?-blocking boolean?
?-dict dictvarname? ?-headers headersvarname?
?-status statusvarname?
- This command requests HEAD from the resource. It returns
whether a 2XX code was returned as a result of the request, never throwing
an S3 remote error. That is, if this returns 1, the resource exists and is
accessible. If this returns 0, something went wrong, and the
-status result can be consulted for details.
- -bucket
- This specifies the bucket from which the resource will be
read. Leading and/or trailing slashes are removed for you, as are
spaces.
- -resource
- This is the full name of the resource within the bucket. A
single leading slash is removed, but not a trailing slash. Spaces are not
trimmed.
- -blocking
- The standard blocking flag.
- -dict
- If specified, the resulting dictionary from the
S3::REST call is assigned to the indicated (not necessarily global)
variable in the caller's scope.
- -headers
- If specified, the dictionary of headers from the result are
assigned to the indicated (not necessarily global) variable in the
caller's scope.
- -status
- If specified, the indicated (not necessarily global)
variable in the caller's scope is assigned a 2-element list. The first
element is the 3-digit HTTP status code, while the second element is the
HTTP message (such as "OK" or "Forbidden").
- S3::GetAcl ?-blocking boolean?
?-bucket bucketname? -resource resourcename
?-result-type REST|xml|pxml?
- This command gets the ACL of the indicated resource or
throws an error if it is unavailable.
- -blocking boolean
- See above for standard definition.
- -bucket
- This specifies the bucket from which the resource will be
read. Leading and/or trailing slashes are removed for you, as are
spaces.
- -resource
- This is the full name of the resource within the bucket. A
single leading slash is removed, but not a trailing slash. Spaces are not
trimmed.
- -parse-xml xml
- The XML from a previous GetACL can be passed in to be
parsed into dictionary form. In this case, -result-type must be pxml or
dict.
- -result-type REST
- The dictionary returned by S3::REST is the return
value of S3::GetAcl. In this case, a non-2XX httpstatus will not
throw an error.
- -result-type xml
- The raw XML of the body is returned as the result (with no
encoding applied).
- -result-type pxml
- The XML of the body as parsed by xsxp::parse is
returned.
- -result-type dict
- This fetches the ACL, parses it, and returns a dictionary
of two elements.
The first element has the key "owner" whose value is the canonical
ID of the owner of the resource.
The second element has the key "acl" whose value is a dictionary.
Each key in the dictionary is one of Amazon's permissions, namely
"READ", "WRITE", "READ_ACP",
"WRITE_ACP", or "FULL_CONTROL". Each value of each key
is a list of canonical IDs or group URLs that have that permission.
Elements are not in the list in any particular order, and not all keys are
necessarily present. Display names are not returned, as they are not
especially useful; use pxml to obtain them if necessary.
- S3::PutAcl ?-blocking boolean?
?-bucket bucketname? -resource resourcename
?-acl new-acl?
- This sets the ACL on the indicated resource. It returns the
XML written to the ACL, or throws an error if anything went wrong.
- -blocking boolean
- See above for standard definition.
- -bucket
- This specifies the bucket from which the resource will be
read. Leading and/or trailing slashes are removed for you, as are
spaces.
- -resource
- This is the full name of the resource within the bucket. A
single leading slash is removed, but not a trailing slash. Spaces are not
trimmed.
- -owner
- If this is provided, it is assumed to match the owner of
the resource. Otherwise, a GET may need to be issued against the resource
to find the owner. If you already have the owner (such as from a call to
S3::GetAcl, you can pass the value of the "owner" key as
the value of this option, and it will be used in the construction of the
XML.
- -acl
- If this option is specified, it provides the ACL the caller
wishes to write to the resource. If this is not supplied or is empty, the
value is taken from S3::Configure -default-acl. The ACL is written
with a PUT to the ?acl resource.
If the value passed to this option starts with "<", it is taken
to be a body to be PUT to the ACL resource.
If the value matches one of the standard Amazon x-amz-acl headers (i.e., a
canned access policy), that header is translated to XML and then applied.
The canned access policies are private, public-read, public-read-write,
and authenticated-read (in lower case).
Otherwise, the value is assumed to be a dictionary formatted as the
"acl" sub-entry within the dict returns by S3::GetAcl
-result-type dict. The proper XML is generated and applied to the
resource. Note that a value containing "//" is assumed to be a
group, a value containing "@" is assumed to be an
AmazonCustomerByEmail, and otherwise the value is assumed to be a
canonical Amazon ID.
Note that you cannot change the owner, so calling GetAcl on a resource owned
by one user and applying it via PutAcl on a resource owned by another user
may not do exactly what you expect.
- S3::Delete ?-bucket bucketname?
-resource resourcename ?-blocking boolean?
?-status statusvar?
- This command deletes the specified resource from the
specified bucket. It returns 1 if the resource was deleted successfully, 0
otherwise. It returns 0 rather than throwing an S3 remote error.
- -bucket
- This specifies the bucket from which the resource will be
deleted. Leading and/or trailing slashes are removed for you, as are
spaces.
- -resource
- This is the full name of the resource within the bucket. A
single leading slash is removed, but not a trailing slash. Spaces are not
trimmed.
- -blocking
- The standard blocking flag.
- -status
- If specified, the indicated (not necessarily global)
variable in the caller's scope is set to a two-element list. The first
element is the 3-digit HTTP status code. The second element is the HTTP
message (such as "OK" or "Forbidden"). Note that
Amazon's DELETE result is 204 on success, that being the code indicating
no content in the returned body.
- S3::Push ?-bucket bucketname?
-directory directoryname ?-prefix prefixstring?
? -compare comparemode? ?-x-amz-meta-*
metastring? ? -acl aclcode? ?-delete
boolean? ? -error throw|break|continue?
?-progress scriptprefix?
- This synchronises a local directory with a remote bucket by
pushing the differences using S3::Put. Note that if something has
changed in the bucket but not locally, those changes could be lost. Thus,
this is not a general two-way synchronization primitive. (See
S3::Sync for that.) Note too that resource names are case
sensitive, so changing the case of a file on a Windows machine may lead to
otherwise-unnecessary transfers. Note that only regular files are
considered, so devices, pipes, symlinks, and directories are not
copied.
- -bucket
- This names the bucket into which data will be pushed.
- -directory
- This names the local directory from which files will be
taken. It must exist, be readable via [glob] and so on. If only some of
the files therein are readable, S3::Push will PUT those files that
are readable and return in its results the list of files that could not be
opened.
- -prefix
- This names the prefix that will be added to all resources.
That is, it is the remote equivalent of -directory. If it is not
specified, the root of the bucket will be treated as the remote directory.
An example may clarify.
S3::Push -bucket test -directory /tmp/xyz -prefix hello/world
- In this example, /tmp/xyz/pdq.html will be stored as
http://s3.amazonaws.com/test/hello/world/pdq.html in Amazon's servers.
Also, /tmp/xyz/abc/def/Hello will be stored as
http://s3.amazonaws.com/test/hello/world/abc/def/Hello in Amazon's
servers. Without the -prefix option, /tmp/xyz/pdq.html would be
stored as http://s3.amazonaws.com/test/pdq.html.
- -blocking
- This is the standard blocking option.
- -compare
- If present, this is passed to each invocation of
S3::Put. Naturally, S3::Configure -default-compare is used
if this is not specified.
- -x-amz-meta-*
- If present, this is passed to each invocation of
S3::Put. All copied files will have the same metadata.
- -acl
- If present, this is passed to each invocation of
S3::Put.
- -delete
- This defaults to false. If true, resources in the
destination that are not in the source directory are deleted with
S3::Delete. Since only regular files are considered, the existance
of a symlink, pipe, device, or directory in the local source will
not prevent the deletion of a remote resource with a corresponding
name.
- -error
- This controls the behavior of S3::Push in the event
that S3::Put throws an error. Note that errors encountered on the
local file system or in reading the list of resources in the remote bucket
always throw errors. This option allows control over "partial"
errors, when some files were copied and some were not. S3::Delete
is always finished up, with errors simply recorded in the return
result.
- throw
- The error is rethrown with the same errorCode.
- break
- Processing stops without throwing an error, the error is
recorded in the return value, and the command returns with a normal
return. The calls to S3::Delete are not started.
- continue
- This is the default. Processing continues without throwing,
recording the error in the return result, and resuming with the next file
in the local directory to be copied.
- -progress
- If this is specified and the indicated script prefix is not
empty, the indicated script prefix will be invoked several times in the
caller's context with additional arguments at various points in the
processing. This allows progress reporting without backgrounding. The
provided prefix will be invoked with additional arguments, with the first
additional argument indicating what part of the process is being reported
on. The prefix is initially invoked with args as the first
additional argument and a dictionary representing the normalized arguments
to the S3::Push call as the second additional argument. Then the
prefix is invoked with local as the first additional argument and a
list of suffixes of the files to be considered as the second argument.
Then the prefix is invoked with remote as the first additional
argument and a list of suffixes existing in the remote bucket as the
second additional argument. Then, for each file in the local list, the
prefix will be invoked with start as the first additional argument
and the common suffix as the second additional argument. When
S3::Put returns for that file, the prefix will be invoked with
copy as the first additional argument, the common suffix as the
second additional argument, and a third argument that will be
"copied" (if S3::Put sent the resource),
"skipped" (if S3::Put decided not to based on
-compare), or the errorCode that S3::Put threw due to
unexpected errors (in which case the third argument is a list that starts
with "S3"). When all files have been transfered, the prefix may
be invoked zero or more times with delete as the first additional
argument and the suffix of the resource being deleted as the second
additional argument, with a third argument being either an empty string
(if the delete worked) or the errorCode from S3::Delete if it
failed. Finally, the prefix will be invoked with finished as the
first additional argument and the return value as the second additional
argument.
- The return result from this command is a dictionary. They
keys are the suffixes (i.e., the common portion of the path after the
-directory and -prefix), while the values are either
"copied", "skipped" (if -compare indicated not
to copy the file), or the errorCode thrown by S3::Put, as
appropriate. If -delete was true, there may also be entries for
suffixes with the value "deleted" or "notdeleted",
indicating whether the attempted S3::Delete worked or not,
respectively. There is one additional pair in the return result, whose key
is the empty string and whose value is a nested dictionary. The keys of
this nested dictionary include "filescopied" (the number of
files successfully copied), "bytescopied" (the number of data
bytes in the files copied, excluding headers, metadata, etc),
"compareskipped" (the number of files not copied due to
-compare mode), "errorskipped" (the number of files not
copied due to thrown errors), "filesdeleted" (the number of
resources deleted due to not having corresponding files locally, or 0 if
-delete is false), and "filesnotdeleted" (the number of
resources whose deletion was attempted but failed).
Note that this is currently implemented somewhat inefficiently. It fetches
the bucket listing (including timestamps and eTags), then calls
S3::Put, which uses HEAD to find the timestamps and eTags again.
Correcting this with no API change is planned for a future upgrade.
- S3::Pull ?-bucket bucketname?
-directory directoryname ?-prefix prefixstring?
? -blocking boolean? ?-compare comparemode?
?-delete boolean? ?-timestamp aws|now?
?-error throw|break|continue? ?-progress
scriptprefix?
- This synchronises a remote bucket with a local directory by
pulling the differences using S3::Get If something has been changed
locally but not in the bucket, those difference may be lost. This is not a
general two-way synchronization mechanism. (See S3::Sync for that.)
This creates directories if needed; new directories are created with
default permissions. Note that resource names are case sensitive, so
changing the case of a file on a Windows machine may lead to
otherwise-unnecessary transfers. Also, try not to store data in resources
that end with a slash, or which are prefixes of resources that otherwise
would start with a slash; i.e., don't use this if you store data in
resources whose names have to be directories locally.
Note that this is currently implemented somewhat inefficiently. It fetches
the bucket listing (including timestamps and eTags), then calls
S3::Get, which uses HEAD to find the timestamps and eTags again.
Correcting this with no API change is planned for a future upgrade.
- -bucket
- This names the bucket from which data will be pulled.
- -directory
- This names the local directory into which files will be
written It must exist, be readable via [glob], writable for file creation,
and so on. If only some of the files therein are writable, S3::Pull
will GET those files that are writable and return in its results the list
of files that could not be opened.
- -prefix
- The prefix of resources that will be considered for
retrieval. See S3::Push for more details, examples, etc. (Of
course, S3::Pull reads rather than writes, but the prefix is
treated similarly.)
- -blocking
- This is the standard blocking option.
- -compare
- This is passed to each invocation of S3::Get if
provided. Naturally, S3::Configure -default-compare is used if this
is not provided.
- -timestamp
- This is passed to each invocation of S3::Get if
provided.
- -delete
- If this is specified and true, files that exist in the
-directory that are not in the -prefix will be deleted after
all resources have been copied. In addition, empty directories (other than
the top-level -directory) will be deleted, as Amazon S3 has no
concept of an empty directory.
- -error
- See S3::Push for a description of this option.
- -progress
- See S3::Push for a description of this option. It
differs slightly in that local directories may be included with a trailing
slash to indicate they are directories.
- The return value from this command is a dictionary. It is
identical in form and meaning to the description of the return result of
S3::Push. It differs only in that directories may be included, with
a trailing slash in their name, if they are empty and get deleted.
- S3::Toss ?-bucket bucketname?
-prefix prefixstring ?-blocking boolean?
?-error throw|break|continue? ?-progress
scriptprefix?
- This deletes some or all resources within a bucket. It
would be considered a "recursive delete" had Amazon implemented
actual directories.
- -bucket
- The bucket from which resources will be deleted.
- -blocking
- The standard blocking option.
- -prefix
- The prefix for resources to be deleted. Any resource that
starts with this string will be deleted. This is required. To delete
everything in the bucket, pass an empty string for the prefix.
- -error
- If this is "throw", S3::Toss rethrows any
errors it encounters. If this is "break", S3::Toss
returns with a normal return after the first error, recording that error
in the return result. If this is "continue", which is the
default, S3::Toss continues on and lists all errors in the return
result.
- -progress
- If this is specified and not an empty string, the script
prefix will be invoked several times in the context of the caller with
additional arguments appended. Initially, it will be invoked with the
first additional argument being args and the second being the
processed list of arguments to S3::Toss. Then it is invoked with
remote as the first additional argument and the list of suffixes in
the bucket to be deleted as the second additional argument. Then it is
invoked with the first additional argument being delete and the
second additional argument being the suffix deleted and the third
additional argument being "deleted" or "notdeleted"
depending on whether S3::Delete threw an error. Finally, the script
prefix is invoked with a first additional argument of "finished"
and a second additional argument of the return value.
- The return value is a dictionary. The keys are the suffixes
of files that S3::Toss attempted to delete, and whose values are
either the string "deleted" or "notdeleted". There is
also one additional pair, whose key is the empty string and whose value is
an embedded dictionary. The keys of this embedded dictionary include
"filesdeleted" and "filesnotdeleted", each of which
has integer values.
LIMITATIONS¶
- •
- The pure-Tcl MD5 checking is slow. If you are processing
files in the megabyte range, consider ensuring binary support is
available.
- •
- The commands S3::Pull and S3::Push fetch a
directory listing which includes timestamps and MD5 hashes, then invoke
S3::Get and S3::Put. If a complex -compare mode is
specified, S3::Get and S3::Put will invoke a HEAD operation
for each file to fetch timestamps and MD5 hashes of each resource again.
It is expected that a future release of this package will solve this
without any API changes.
- •
- The commands S3::Pull and S3::Push fetch a
directory listing without using -max-count. The entire directory is
pulled into memory at once. For very large buckets, this could be a
performance problem. The author, at this time, does not plan to change
this behavior. Welcome to Open Source.
- •
- S3::Sync is neither designed nor implemented yet.
The intention would be to keep changes synchronised, so changes could be
made to both the bucket and the local directory and be merged by
S3::Sync.
- •
- Nor is -compare calc fully implemented. This
is primarily due to Windows not providing a convenient method for
distinguishing between local files that are "public-read" or
"public-read-write". Assistance figuring out TWAPI for this
would be appreciated. The U**X semantics are difficult to map directly as
well. See the source for details. Note that there are not tests for calc,
since it isn't done yet.
- •
- The HTTP processing is implemented within the library,
rather than using a "real" HTTP package. Hence, multi-line
headers are not (yet) handled correctly. Do not include carriage returns
or linefeeds in x-amz-meta-* headers, content-type values, and so on. The
author does not at this time expect to improve this.
- •
- Internally, S3::Push and S3::Pull and
S3::Toss are all very similar and should be refactored.
- •
- The idea of using -compare never
-delete true to delete files that have been deleted from one
place but not the other yet not copying changed files is untested.
USAGE SUGGESTIONS¶
To fetch a "directory" out of a bucket, make changes, and store it
back:
file mkdir ./tempfiles
S3::Pull -bucket sample -prefix of/interest -directory ./tempfiles \
-timestamp aws
do_my_process ./tempfiles other arguments
S3::Push -bucket sample -prefix of/interest -directory ./tempfiles \
-compare newer -delete true
To delete files locally that were deleted off of S3 but not otherwise update
files:
S3::Pull -bucket sample -prefix of/interest -directory ./myfiles \
-compare never -delete true
FUTURE DEVELOPMENTS¶
The author intends to work on several additional projects related to this
package, in addition to finishing the unfinished features.
First, a command-line program allowing browsing of buckets and transfer of files
from shell scripts and command prompts is useful.
Second, a GUI-based program allowing visual manipulation of bucket and resource
trees not unlike Windows Explorer would be useful.
Third, a command-line (and perhaps a GUI-based) program called
"OddJob" that will use S3 to synchronize computation amongst
multiple servers running OddJob. An S3 bucket will be set up with a number of
scripts to run, and the OddJob program can be invoked on multiple machines to
run scripts on all the machines, each moving on to the next unstarted task as
it finishes each. This is still being designed, and it is intended primarily
to be run on Amazon's Elastic Compute Cloud.
BUGS, IDEAS, FEEDBACK¶
This document, and the package it describes, will undoubtedly contain bugs and
other problems. Please report such in the category
amazon-s3 of the
Tcllib SF Trackers [
http://sourceforge.net/tracker/?group_id=12883].
Please also report any ideas for enhancements you may have for either package
and/or documentation.
KEYWORDS¶
amazon, cloud, s3
CATEGORY¶
Networking
COPYRIGHT¶
Copyright (c) Copyright 2006,2008 Darren New. All Rights Reserved. See LICENSE.TXT for terms.