Cold Storage Data Retrieval For AWS Glacier And Glacier Deep Archive

Photo by Chris Marquardt on Unsplash
AWS offers three different retrieval tiers for Glacier and Glacier Deep Archive— Expedited, Standard, and Bulk which are different in the first-byte latency and the retrieval cost.
Example: Retrieving 500 archives with a size of 1 GB each cost for Expedited: 20 USD, for Standard: 5,03 USD, and for Bulk 1,26 USD (January 2021)

Restoring A Single File Using The Management Console

Let’s assume you just want to restore one particular file. In that case the Management Console is probably the easiest way. If you want to restore multiple files or complete directories using the Management Console would be a time-consuming madness. The chapter Restoring Multiple Files Using The AWS Command Line Interface (CLI) And S3cmd describes a better way for doing this (read it below).

Choose the object you want to restore.
Click the Initiate restore button.
Choose the Retrieval tier. This will highly affect the time it will take until you can access your file and the price you have to pay for the retrieval process. Also, specify how many days the restored data should be available for download.
Depending on the chosen Retrieval tier it takes a while (minutes up to several hours) to restore your data. If you changed your mind and if you want to access your data even quicker it is possible to upgrade the retrieval tier.
The retrieval process has been finished.
When the retrieval process is finished you can download the archive like any other objects stored on S3 Standard.

Restoring Multiple Files Using The AWS Command Line Interface (CLI) And S3Cmd

If you have to restore multiple files or complete directories the AWS Management Console would be a pain to use. It simply does not offer such functionality as a one-click operation. For this use case, the AWS Command Line Interface (CLI) [7] or the command-line tool S3cmd [6] are better choices. Both tools are available for Windows, Linux, and macOS and are easy to install. Please follow the given links [6, 7] to get them up and running.

Restore Data Via The CLI

Assume you want to restore every single object within a given bucket. The following list shows all the steps to get your data back via running a CLI command within a shell.

  1. Get a list of all objects within a bucket
  2. Initiate the restoration process of these objects
  3. Finally, download your data
aws s3api list-objects-v2 \
--bucket my-cold-storage-bucket \
--query "Contents[?StorageClass==’GLACIER’]" \
--output text \
| awk -F’\t’ ‘{print $2}’ > glacier-object-list.txt
  1. ETag — a hash of the object
  2. Key — the objects name
  3. Last modified date
  4. File size
  5. Storage type
"84b933df90e6a97bb1f41fec82e101ad" dir1/example.obj 2021–01–21T16:30:36.000Z 10244 GLACIER
aws s3api restore-object \
--bucket my-cold-storage-bucket \
--key dir1/example.obj \
--restore-request ‘{
"Days" : 1,
"GlacierJobParameters" : {"Tier":"Bulk"}
}’
# Monitor the status of your restore requestaws s3api head-object --bucket my-cold-storage-bucket --key dir1/example.obj# Still in progress
{
"Restore": "ongoing-request=\"true\"",
"StorageClass": "GLACIER",
...
}
# Restore has been completed
{
"Restore": "ongoing-request=\"false\", expiry-date=\"Mon, 15 Feb 2021 00:00:00 GMT\"",
"StorageClass": "GLACIER",
...
}
while read KEY
do
echo "Restoring object: $KEY"
aws s3api restore-object \
--bucket my-cold-storage-bucket \
--key "$KEY" \
--restore-request ‘{
"Days" : 1,
"GlacierJobParameters" : {"Tier":"Bulk"}
}’
echo "Restore request for $KEY sent"
done < glacier-object-list.txt
aws s3 sync s3://my-cold-storage-bucket ./backup --force-glacier-transfer

Restore Data With The Open Source Tool S3cmd

Another way for retrieving your data from S3 is the Open Source Python tool S3cmd [6]. The GPL-2-licensed tool supports the complete chain for dealing with objects and buckets on S3 (upload, query, retrieve, delete etc.)

brew install s3cmd
s3cmd --configure
s3cmd restore \
--recursive s3://my-cold-storage-bucket \
--restore-priority=bulk \
--restore-days=7
s3cmd sync s3://my-cold-storage-bucket/folder /destination/folder

Summary

  1. The cost of storing data on Glacier depends on your chosen storage class. The colder the storage the cheaper it gets. For Glacier Deep Archive you will pay less than for using Glacier.
  2. Retrieving data takes minutes up to several hours. You can choose retrieval tiers (Expedited, Standard, and Bulk) to speed up the process. So only use Glacier or Glacier Deep Archive if you very rarely want to access these files otherwise S3 Standard would be the better choice.
  3. Retrieving data costs money too. The cost depends on the retrieval tier and the amount of data you want to restore. Calculate your retrieval costs forehand with the AWS Price Calculator [3].
  4. Deleting data on Glacier younger than 90 days also incurs an extra charge.
  5. For retrieving single objects the AWS Management Console is handy to use. For complete directories or multiple files use the AWS CLI or the S3cmd tool.

References

  1. [Restoring Archived Objects] https://docs.aws.amazon.com/AmazonS3/latest/dev/restoring-objects.html
  2. [AWS S3 Pricing] https://aws.amazon.com/s3/pricing
  3. [AWS Price Calculator] https://calculator.aws
  4. [AWS Glacier API] https://awscli.amazonaws.com/v2/documentation/api/latest/reference/glacier/index.html
  5. [AWS Glacier FAQ] https://aws.amazon.com/glacier/faqs/?nc1=h_ls
  6. [S3cmd] https://s3tools.org/s3cmd
  7. [Install AWS CLI] https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
  8. [AWK Linux Man Page] https://linux.die.net/man/1/awk
  9. [Brew Package Manager] https://brew.sh

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Frank Haubenschild

Frank Haubenschild

53 Followers

Dad, Software Engineer, Photographer, Reef- & Bee-Keeper, Founder, Drone Pilot — 🤓 💻 📷 🐝 🐠 💡👨‍✈️