S3 is not (just) storage

Time and time again we see teams making the same mistake in treating Amazon S3, or any cloud storage for that matter, like a hard drive in the cloud. And it's easy to think that way because it is so straightforward to use and, at $2 for 100GB, it is practically free. This mental model leads to architecture decisions that work fine in development and testing but often create ballooning costs in production.

The Mental Model Problem

When we think of storage, we typically imagine a simple container that holds data — like a hard drive, USB stick, or a proverbial bucket. This line of reasoning treats storage as a solved problem; if I can store my files securely, conveniently and reliably there is not much more to think about. And since S3 offers virtually unlimited capacity and high reliability, it is easy to just stop there.

This "set and forget" mentality might be fine for your personal Dropbox, but it's a potentially costly oversight when building systems at scale. S3 isn't just a place to put things; it is a multidimensional service with at least four key dimensions that all contribute to your bill:

  • Storage - What you pay to keep your data
  • Requests - What you pay to interact with your data
  • Retrieval - What you pay to get your data back
  • Transfer - What you pay to move your data

This variety of dimensions means there are quite a few ways your AWS bill can quietly balloon while you're not looking. If S3 were just a monolithic product with a one-size-fits-all approach, we'd have no choice but to watch our costs steadily increase day in and day out. Fortunately, that's not the case.

S3 Storage Classes

AWS offers a range of S3 storage classes designed specifically to address this multi-dimensional cost structure, allowing us to actively manage our cloud storage expenses instead of just paying whatever bill comes our way.

Each class represents a specific tradeoff across these dimensions, optimized for different use cases and access patterns.

Storage classes aren't properties of the bucket — they are tied to each individual object. A bucket can contain objects with different storage classes, just like a real bucket can hold various items. You can have some objects in Standard, others in Glacier, and more in Intelligent-Tiering, all in the same bucket.

Image is worth a thousand words they say, so let's look into the interactive visual representation of how the dimensions we discussed compare across storage classes (lower values are better, meaning less cost/better performance):

S3 Express One Zone
S3 Standard
S3 Standard-IA
S3 One Zone-IA
S3 Glacier Instant Retrieval
S3 Glacier Flexible Retrieval
S3 Glacier Deep Archive

S3 Storage Classes Comparison

The radar chart visualizes different dimensions on a single scale of 1-10 for easier comparison. While this simplification sacrifices some precision, it helps illustrate the relative trade-offs between storage classes.

For exact values, refer to the detailed comparison table below.

Probably the first thing you've noticed is that Express One Zone storage cost point flying "off the chart". That is because the price for keeping data in this storage class dwarfs all the others, with a cost that is 7 times higher than what you would pay for the same amount of data in Standard class. Choosing this class is a very straightforward decision - you'll do it when you absolutely need single-digit millisecond latency. To have that, you will pay the price both in dollars but also in durability, since data is replicated only in one Availability Zone.

You'll also see that we've added Retrieval Latency as an additional dimension to the chart. While it doesn't directly appear on your bill as a line item, it guides architectural decisions that can significantly impact your overall costs and application performance.

This visualization reveals the core tradeoff in S3 storage classes: as you move from Standard toward Glacier Deep Archive, storage costs dramatically decrease while retrieval becomes slower and more expensive. Transfer costs remain constant across all classes - a reminder that some dimensions can't be optimized through storage class selection alone.

Now that we know what are the options and trade-offs the question asked is - how to choose?

Understanding Object Access Patterns Over Time

One of the most critical factors in choosing the right storage class is understanding how your data is accessed throughout its lifetime. The diagram below illustrates five common access patterns we see in real-world applications:

s3 object access patterns

Unpredictable access pattern: Some objects have sporadic, unpredictable access. They might be heavily accessed for a period, then dormant, then accessed again later with no discernible pattern. User-generated content often follows this pattern, making it ideal for S3 Intelligent-Tiering, which automatically moves objects between tiers based on actual usage.

Diminishing access pattern: This is perhaps the most common pattern - objects are frequently accessed when new, then gradually accessed less over time. Think of reports, product images, or news articles that receive less attention as they age. A lifecycle configuration transitioning from S3 Standard → Standard-IA → Glacier can significantly reduce costs for this pattern.

Periodic access pattern: Some objects are accessed at predictable intervals - weekly reports, monthly analytics, or quarterly reviews. For these, Standard-IA often makes sense if the access cadence is more than 30 days apart, as the retrieval costs are offset by the lower storage costs.

Front-loaded access pattern: These objects see intense activity immediately after creation but then are rarely or never accessed again. Log files and event data often follow this pattern. After the initial processing period, these objects are prime candidates for rapid transition to cold storage classes.

Archive-only pattern: Some objects are stored purely for archival, compliance, or emergency purposes with no expectation of regular access. These should go directly to S3 Glacier or S3 Glacier Deep Archive, possibly after a brief verification period in S3 Standard.

By recognizing which pattern your data follows, you can design a storage strategy that significantly reduces costs while maintaining necessary performance. The right approach might be different for each type of object in your application.

Hopefully, now you see that those 2 bucks you're paying for 100GB is throwing your money away if you're never going to touch those files again. For rarely accessed data, cold storage options like Glacier can reduce your storage costs by up to 95%!

Do not overthink it

"What if I retrieve this file 3 times this year instead of 2? Or what if it's 5 times? Or—oh no—what if there's a sudden audit and we need to retrieve everything at once? At what magical threshold does cold storage actually save money?" It's easy to fall down the rabbit hole of cost calculus, plotting retrieval scenarios like you're planning a chess game six moves ahead. But it's rarely worth the effort. That kind of analysis only makes sense if you handle huge amounts of data, and even more importantly, actually have hard data on S3 usage that you can rely on.

But in the majority of cases, common sense and some basic planning will get you very far. When saving your objects, ask yourself these questions:

  1. Will access patterns change over time and how? (Predictable, unpredictable)
  2. How often will I access this data? (Frequently, occasionally, rarely, never)
  3. How quickly do I need access when I do? (Milliseconds, minutes, hours)
  4. How long will I keep this data? (Days, months, years)

The answers form a profile that points to the optimal storage class for your use case. Or even more conveniently, just use our free S3 Storage Class Advisor tool

Conclusion

If all this sounds like it's too much work for not that much gain — think again. It's easy to get started by following a few simple rules outlined above, and rewards will be instantly visible in your monthly bill.

Sure, there's a lot more that can be said and done here, and we'll cover some of it in the future posts. But for now, just repeat after me: S3 is not just storage, S3 is not just storage, S3 is not just storage...