Hacker News new | past | comments | ask | show | jobs | submit login
The DynamoDB Paper (brooker.co.za)
256 points by krnaveen14 on July 14, 2022 | hide | past | favorite | 91 comments



Rick Houlihan did a talk a few years ago about designing the data later for an application using dynamodb. The most common reaction I get from people I show it to- most of them Amazon SDEs who operate services that use Dynamodb- is "Holy shit what is this wizardry?!"

https://youtu.be/HaEPXoXVf2k

One of the biggest mistakes people make with dynamo is thinking that it's just a relational database with no relations. It's not.

It's an incredible system, but it requires a lot of deep knowledge to get the full benefits, and it requires you, often, to design your data layer very well up-front. I actually don't recommend using it for a system that hasn't mostly stabilized in design.

But when used right, it's an incredibly performant beast of a data store.


It's worth noting that a lot of the early database designs, including this 2018 video pre-date some dramatic improvements to dynamodb usability.

I think the biggest ones were:

- an increase in the number of GSIs you can create (Dec 2018) [1]

- making on-demand possible [2]

- an increase in the default limit for number of tables you can create (Mar 2022) [3]

I don't think these new features necessarily make the single-table, overloaded GSI strategy that's discussed in the video obsolete, but they enable applications which are growing to adopt an incremental GSI approach and use multiple tables as their data access patterns mature.

Some other posters have recommended Alex DeBrie's dynamodb book and I also think that's an excellent resource, but I'd caution people who are getting into dynamodb not to be scared by the claims that dynamodb is inflexible to data access changes, since AWS has been adding a lot of functionality to support multi-table, unknown access patterns, emerging secondary indexes, etc.

- [1] https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-dy...

- [2] https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-n...

- [3] https://aws.amazon.com/about-aws/whats-new/2022/03/amazon-dy...


Something else important to mention is that dynamodb now re-consolidates tables.

This is a lousy explanation, but Read/Write quota is split evenly over all partitions. Each partition is created based on the hash-key used, and there's an upper limit on how much data can be stored in any given partition. So if you end up with a hot hash-key, lots of stuff in it, that data gets split over more and more and more partitions, and the overall throughput goes down (quota is split evenly over partitions).

I believe this is still a general risk, and you need to be extremely canny about your use of hash key to avoid it, but historically they couldn't reconsolidate partitions. So you'd end up with a table in a terrible state with quota having to be sky high to still get effective performance. The only option then was to completely rotate tables. New table with a better hash-key, migrate data (or whatever else you needed to do).

Now at least, once the data is gone, the partitions will reconsolidate, so an entire table isn't a complete loss.


This bit me badly - An application that did significant autoscaling, and hit a peak of 30,000 read/write requests per second - But typically did more like 300.

The conversation with the Amazon support engineer told us that we had over a hundred partitions (which even he admitted was high for that number), and so our quota was effectively giving us 0 iops per partition. This obviously didn't work, and their only solution was "scale it back up, copy everything to a new table". Which we did, but was an engineering effort I'd rather have avoided.


People don't need to be scared they just need to do their homework.

In my opinion having more tables and more GSIs available won't help you very much if you started with flawed data model (unless you kept making the same design mistakes 256 times). A team that tries to claw back from a flawed table design by pilling up GSIs is just in for a world of pain.

So if you are planing to go with Dynamo: - Read about the data modeling tecniques - Figure out your access patterns - Check if your application and model can withstand the eventual consistency of GSIs - Have a plan to rework your data model if requirements change: Are you going to incrementally rewrite your table? Are you going to export it and bulk load a fixed data model? How much is that going to cost?


I also recommend Alex DeBrie's "The DynamoDB Book" (https://www.dynamodbbook.com/). It is a great resource that talks about these design patterns in depth. It has served me and my team well over the past few years.


Seconded! Alex DeBrie is a great teacher.


For explicitness & searchability, commenting with the title of this talk, which is indeed excellent, not limited to DynamoDB, and which was kind of a revelation after years of using DynamoDB suboptimally:

Rick Houlihan - AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB (DAT401) , https://www.youtube.com/watch?v=HaEPXoXVf2k

It should be watched along with reading the associated doc: https://docs.aws.amazon.com/amazondynamodb/latest/developerg...


Definitely one of my favorite talks by Rick and I apply lessons learned in that video on a daily basis.

Must of watched that video...about 4-5 times, before I really grasp the topics since I started my career that burned the concept of relational databases into my head. Breaking from that pattern of thought was difficult, initially.


Indeed with the GSI's etc you can implement a priority queue or store data in the order you want etc. Once you are clear on the access patterns of your app DynamoDB is amazing to model for and will scale with your app. But if you are not clear about your app's access patterns or need adhoc queries, then dynamoDB is not a good fit.


Can be performant, nowadays anyway. Worked with a team who built their own implementation because Amazon's was too slow and expensive.

It's a weird model. Too small of a dataset and it doesn't quite make sense to use Dynamo. Too big of a dataset and it's full of footguns. Medium-sized may be too expensive.


Too-small seems to be the perfect use case for DDB. I need someplace to stash stuff and look it up by key. A full RDS is overkill, as is anything else that requires nodes that charge by the hour.


Thank you for this recommendation, I'm on a DynamoDB contract job and... really learning to think hard about key structure and designing for efficient querying, rather than efficient storage.


Thanks, bookmarked this. It's good to see a proper take on data modelling on document stores instead of just "through any old JSON in there it'll be fine!!!"


I'be been working with DynamoDB daily for a few years now, and whilst I like working with it and the specific scenario it solves for us, I'd still urge anyone thinking about using it to carefully reconsider whether their problem is truly unique enough that a traditional RDBMS couldn't handle it with some tuning. Theycan be unbelievably performant and give so much stuff for free.

Designing application specifically for DynamoDB will take _a lot_ of time and effort. I think we could have saved almost a third of our entire development time had we used more of the boring stuff.


"I'd still urge anyone thinking about using it to carefully reconsider whether their problem is truly unique enough that a traditional RDBMS couldn't handle it with some tuning."

Lately, the problem I've seen is people who haven't even considered whether their problem is truly unique enough that a traditional RDBMS couldn't handle it without some tuning. (Here I don't count "set up the obvious index" as "tuning", because if you're using a non-RDBMS the same work is encompassed in figuring out what to use as keys. No escaping that one regardless of technology.)

I'm losing track of the number of teams in my company I've seen switching databases after they rolled to production because it turns out they picked a database that doesn't support the primary access pattern for their data in some cases, or in other cases, a very common secondary access pattern. In all the cases I've seen so far, it's been for quantities of data that an RDMBS would have chewed up and spat out without even noticing. It's amazing how much trouble you can get yourself into with non-relational databases with just a few hundred megabytes of data, or even a few tens of megabytes of data if you fall particularly hard for the "it's fast and easy!" hype too hard and end up accidentally writing a pessimal schema because you thought using a non-relational database meant you got to think less about your schema than a relational DB.

That is precisely backwards; NoSQL-type DBs get their power from you spending a lot more time and care in thinking about exactly how you plan on accessing data. Many NoSQL databases loosen the constraints on what you can store in a given record, but in return they are a great deal more fussy about how you access records. If you want to skip careful design of how you access records, you want the relational DB. And nowadays, tossing a JSON field into a relational row is quite cheap and effective for those "catch alls" in the schema.

There's some interesting hybrids out there now if you want a bit of both worlds. For instance, Clickhouse is not an SQL database, but it more gracefully handles a lot of SQL-esque workloads than many other NoSQL-esque databases. You can get much farther with "I need a NoSQL-style database, but every once in a while I need an SQL-like bit of functionality", than you can in something like Cassandra.


+1

Discovered this while building https://github.com/plutomi/plutomi as I was enamored by Rick's talks and guarantees of `performance at any scale`. In reality, Dynamo was solving scaling issues that we didn't have and the amount of times I've had to rework something to get around some of the quirks of Dynamo led to a lot of lost dev time.

Now that the project is getting more complex, doing simple things such as "searching" (for our use case) are virtually impossible without hosting an ElasticSearch cluster where a simple like %email% in postgres would have sufficed.

Not saying it's a bad DB at all, but you really need to know your access patterns and plan accordingly. Dynamo streams are a godsend and combined with EventBridge you can do some powerful things for asynchronous events. Not paying while it's not running with on demand is awesome, and the performance is truly off the charts. Just please know what you are getting into. In fact, I'd recommend only using Dynamo if you are migrating a "finished" app vs using it for apps that are still evolving


I think there are good reasons to choose DynamoDB over a RDBMS that have nothing to do with scalability.

I've used DynamoDB several times over the past several years in the context of providing a datastore for a microservice. In all cases it was cheaper and easier than RDS, and the ability to add GSIs has enabled me to adapt to all of the new access patterns I've had to deal with.

For us, DynamoDB has become a 'boring' option.


I think it also depends on the system you’re using it on. I think one of the biggest advantages of DDB is that is scales so well (with good design to avoid hot partitions). Afaik, RDBMS simply cannot scale in the same way due to their design. Yes, they can scale somewhat, but as you said it requires lots of tuning, and you’ll still reach a hardish limit.


One partition of DDB is incredibly tiny compared to one partition of an RDBMS. You can push that one partition of RDBMS pretty far before you're forced to design sharding into your system. With DDB you are basically forced to design sharding into your partition keys up front or you will have hot partition issues. This is by far the most common problem I see with teams using DDB, so brushing it off as "with good design to avoid hot partitions" is understating the scope of the problem.


All databases scale the same way - by partitioning and sharding the dataspace. RDBMS have harder restrictions due to the features they provide and the performance expectations, but you can just as easily use a bunch of relational servers to partition a table (or several) across them by range or hashes of the primary key.

That's basically what key/value stores like DynamoDB do, and why DynamoDB was even built on MySQL (at least originally).


"can just as easily use a bunch of relational servers to partition a table" is not true at all. Managing, maintaining and tuning a sharded relational cluster is an astonishing amount of operations work. partition management, re-partioning, partition failover / promotions / demotions, query routing, shard discovery, upgrades... it goes on an on. All this work is gone if you pick dynamo. Not saying that dynamo is always better, but IMHO people very much underestimate the ops cost of running a sharded relational cluster at scale.


The point is the scaling fundamentals are the same across databases.

Whether that work is managed or not is a different topic, and you can find plenty of managed offerings of scale-out relational databases.


"just as easily" would be the contested part, I'd guess


> Designing application specifically for DynamoDB will take _a lot_ of time and effort

Disagree with this. Your team could think of it as a document database, and you can have utility libraries that filter and sort based on PK / SK combinations to provide a seamless experience.


If you want your DynamoDB table to scale well you'll have to put in a lot of upfront effort.


> give so much stuff for free

Interesting choice of words. Performance wise, sure. Money wise? I'm still waiting for a SQL database with pay-per-request pricing. The cost difference is enormous, particularly when you remember that you don't need to spend manpower managing the underlying hardware.

Engineering tradeoffs are more complicated than only considering raw scalability performance and "I can run it myself on a cheap Raspberry Pi".


>Interesting choice of words. Performance wise, sure. Money wise? I'm still waiting for a SQL database with pay-per-request pricing. The cost difference is enormous, particularly when you remember that you don't need to spend manpower managing the underlying hardware.

I assume you're saying DynamoDB is less expensive than SQL because of pay-per-request.

Working on applications with a modest amount of data (a few TB over a few years) pay per request has been incredibly expensive even with scaled provisioning. I would much rather have an SQL database and pay for the server/s. Then I could afford a few more developers!


Have you looked at Planetscale?


Who manages hardware these days? Aurora works quite well.


Is there a specific reason why you say "Designing application specifically for DynamoDB will take _a lot_ of time and effort". Are you talking about migrating from RDBMS to DynamoDB? Coz, my experience with DynamoDB designing was very similar to any other NoSQL DB.


You really need to consider you access patterns up front with DynamoDB. Any changes of those during application development can be very time consuming. There are limitations on how many local and global secondary indexes you can have. You also can’t easily add them to existing tables. However, you can use multiple databases to get the best of both worlds. At my employer, we typically store domain entities in DynamoDB as the source of truth. However, we replicate some entities to secondary databases like OpenSearch when we have access patterns that require adhoc querying.


A lot is transferrable to NoSQL and key-value in general, though DDB has plenty of quirks of it's own. Understanding your problem really is the key. A lot of problems turn out to be quite relational after all

You definitely can build just about anything with DDB, it's often just not worth the time when most can be solved by existing tools


I mean if you go the single table route… https://aws.amazon.com/blogs/compute/creating-a-single-table...


To be fair, you can end up spending a lot of time on the boring stuff as well.


Could you elaborate on your (or a hypothetical) use case where dynamo db makes sense? I for one can never come up with something better served by rdbms or s3.


I'll give you two use cases that I use for DynamoDB, where otherwise I'm primarily a MySQL shop

1) Simple: I have a system that constantly records and stores 30 minute MP3 files of audio streams (1000's of them) in S3. We write the referencing metadata to a table in DynamoDB where users can query by date/time. Given the sheer amount of items (hundreds of millions), we saw far worse performance vs. cost on MySQL vs Dynamo.

2) Complex: I have a system that ingests thousands of tiny MP3 files a minute into S3 and writes the associated metadata to DynamoDB. DynamoDB then has a stream associated with it that runs a lambda to consolidate statistics to another table and stream that metadata to clients via other lambdas or data streams.

Those are two great use cases where we saw better usage patterns with Dynamo vs MySQL.


When you require point and range queries. For example, given a cart-id, fetch the skus; given a authz-token, fetch scopes; given a user-id and a time-range, fetch a list of pending order-ids.

There's a lot more you could do though, DynamoDB, is after all, a wide-column KV store. Ref this re:invent talk from 2018: https://www.youtube-nocookie.com/embed/HaEPXoXVf2k

Apart from being fully-managed, the key selling points of DynamoDB are its consistent performance for a given query type, read-your-writes consistency semantics, auto-replication, auto-disaster recovery.

See also: https://martinfowler.com/bliki/AggregateOrientedDatabase.htm... (mirror: https://archive.is/lc2eO)


The aws reinvent lecture was great and answered exactly when to use dynamodb. I might seriously consider it for some of my applications for sure.


I always tell people there are two clear areas where DynamoDB has some major benefits:

- Very high scale applications that can be tough for an RDBMS to handle

- Serverless applications (e.g. w/ AWS Lambda) due to how the connection model (and other factors) work better with that model.

Then, for about 80% of OLTP applications, you can choose either DynamoDB or RDBMS, and it really comes down to which tradeoffs you prefer.

DynamoDB will give you consistent, predictable performance basically forever, and there's not the long-term maintenance drag of tuning your database as your usage grows. The downside, as others have mentioned, is more planning upfront and some loss of flexibility.


Lots of records (Billions), low/no relational linkage, The need to query/update records in different ways (IE, you need indexes), The need for HA and scaling (IE, perhaps you can be VERY bursty and read heavy).

It's not one size fits all, but at least in my line of work there are few instances where it's a pretty good fit.


If you have a database access layer then structuring your application shouldn't be that different. I wouldn't deal with the database directly unless I had a really good reason or the abstraction layer didn't support the query I was trying to run.


DynamoDB is amazing, but not very flexible once you have designed your database. No abstraction layer will allow you to run queries ad-hoc in a performant way.


It’s true. 400kb max item size, too. 1mb max query size I believe. Good luck grabbing a shit load of data at once without a parallel scan.

Dynamo is a precision tool and it’s great at those specific workloads but it’s not a one size fits all by any means.


400kb is the max item size, the pattern to get around that is to store objects in S3 and URLs/keys to those objects in DDB


> No abstraction layer will allow you to run queries ad-hoc in a performant way.

Depends on the size of the data. Run analytics queries (i.e. things that return summary data not all rows) on 10GB of data through clickhouse or duckdb or datafusion and they'll generally return in milliseconds.


What does this have to do with DynamoDB? The point is that once you've gotten your data into DynamoDB, you're strongly limited in how you can use it until you load it into something else.


I didn't see an obvious connection between the two sentences.


An access layer doesn't change your access patterns, which is what actually determines the database model to use.

DynamoDB (and other similar key/value stores) make very big trade-offs for speed and scale that most applications don't need.


DynamoDB usage is heavily based around correctly structuring your keys. Allowing you to do things like query sub-sets easily. This in turn means you need to know what your usage patterns will be like so you can correctly structure your keys.

God help you if you need to make major changes to this down the road.

Database Access Layer can't do this for you, that just isn't what they do.


totally, or s3


Designing application specifically for DynamoDB will take _a lot_ of time and effort.

If you can write, read, and query a JSON document using an API in your application, it's literally that simple.

The only real time and effort is the architectural decisions you make up front, and that's about it. And there are some great guides out there that cover 99% of those architectural decisions.

As a user of both, I find MySQL replication and clusters to be far more complex and time and effort intensive.


Have to disagree on this one. Something as basic and out of the box as a migration / data backfill is not only complicated but also very expensive (both time and cost wise) on Dynamo. Not to mention all the other things that come nicely with an relational db (type checking, auto increments, uniform data)


To be fair, the parent discusses designing an application to use Dynamo, not data migration.

I'll completely agree with you on migration / backfill. You're going to pay a lot of money to migrate a ton of data into Dynamo, and you'll also definitely increase the complexity in provisioning and setting up that migration pattern.

But my comment stands pretty well considering greefield application development around Dynamo.


> If you can write, read, and query a JSON document using an API in your application, it's literally that simple

You could say that of Elasticsearch or Mongo, too. And it might be technically true, but you haven't scratched the surface of mappings, design, limitations, etc.

You can dump a bunch of data into Dynamo very easily, but what about getting data via secondary indices when you can't get your data with the views you've built without scanning? How do you use partition keys in it? And so on.


> The only real time and effort is the architectural decisions you make up front, and that's about it

And dont forget about the time spent fixing what could have been caught by types and regular old db constraints (for most applications)


It’s a question of change resilience. You can implement crud on a single object with ddb trivially. You can’t implement 5 different list by X property apis trivially, or filter the objects, or deal with foreign keys…


> From the paper [0]: DynamoDB consists of tens of microservices.

Ha! For folks who think two-pizza teams mean 100s of microservices... this is probably the second most scaled-out storage service at AWS (behind S3?), and it runs tens of microservices (pretty sure these aren't micro the way most folks would presume 'em to be).

> What's exciting for me about this paper is that it covers DynamoDB's journey...

Assuming these comments are true [1][2], in a classic Amazon fashion [3], the paper fails to acknowledge a FOSS database (once?) underneath it: MySQL/InnoDB (and references it as B-Tree instead).

[0] https://web.archive.org/web/20220712155558/https://www.useni...

[1] https://news.ycombinator.com/item?id=13173927

[2] https://news.ycombinator.com/item?id=18871854

[3] https://archive.is/T1ZNJ


I'm not sure about DDB, but I know in AWS in general building a new service does not give you credit by default. It's not like the shit Uber promoted: Yeh! We have 8000 services. Look how great we are! In fact, people usually question if someone proposes to create a new service. Working Backwards (i.e., solving real user problems) and Invent and Simplify are indeed two powerful leadership principles. And of course, the sheer amount of work involved in setting up a new service is so much that people have to think twice between starting a new service.


Lots can change over the years. Your links are from 2016 - it's not conceivable that in the last 6 years, Amazon has changed some of the implementation?


DynamoDB was already large scale at that time.

The point is: the number of services don't need to scale with the level of demand.


My comment was about " paper fails to acknowledge a FOSS database (once?) underneath it: MySQL/InnoDB (and references it as B-Tree instead)." should have been more clear


I've found DDB to be exceptional for use cases where eventual consistency is OK and you have a few well defined query patterns. This is a large number of use cases so it's not too limiting. As the number of query patterns grow, indices grow, and costs grow (or pray for your soul you attempt to use DDB transactions to write multiple keys to support differing query patterns). If you need strong consistency, your cost and latency also increases.

Oh, and I'd avoid DAX. Write your own cache layer. The query cache vs. item cache separation[1] in DAX is a giant footgun. It's also very under supported. There still isn't a DAX client for AWS SDK v2 in Go for example[2].

1 - https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

2 - https://github.com/aws/aws-dax-go/issues/2


The way that I learnt the ins and outs of DynamoDB (and there is a lot to learn if you want to use it effectively) is by implementing all the Redis data structures and commands on it. That helped understand both systems in one shot.

The key concept in Dynamo is that you use a partition key on all your bits of data (my mental model is that you get one server per partition) and you then can arrange data using a sort key in that partition. You can then range/inequality query over the sort keys. That’s the gist of it.

The power and scalability comes from the fact that each partition can be individually allocated and scaled, so as long as you spread over partitions you have practically no limits.

And you can do quite a bit with that sort key range/inequality thing. I was pleasantly surprised by how much of Redis I could implement: https://github.com/dbProjectRED/redimo.go


Nice write-up from Marc. This definitely hits on the most common problems distributed systems face. I haven't read the paper yet but it is pretty cool they published this and talk about changes over time.

1. Managing 'heat' in the system (or assuming that you'll have an uniform distribution of requests)

2. Recovering a distributed system from a cold state and what that implies for your caches.

3. The obvious one that people that do this type of thing spend a lot of time thinking about: CAP theorem shenanigans and using Paxos.

Reminds me of the Grugbrained developer on microservices: https://grugbrain.dev/#grug-on-microservices

Good luck getting every piece working on the first major recovery. My 100% unscientific hunch is that most folks aren't testing their cold state recovery from a big failure, much how folks don't test their database restoration solutions (or historically haven't).


These days I’d probable take a closer look at spanner. It is a consistent and scalable db. It makes life much easier for developers.

Like Cassandra, dynamodb requires the data model to be designed very carefully to be able to get the max out of them.

More often than not, that simply adds more complexity; people often underestimate how much a sharded mysql/Postgres can scale.

My default choice for the longest time: Postgres for the data I care about, ES as secondary index and S3 as blob storage.


True. Spanner and the likes of Spanner, CockroachDB, YugaByte all are strongly consistent and scalable dbs. The greatest advantage IMO is the ability to just use SQL without having to worry about carefully designing a data model. What bothers me however is that these data stores are not truly relational data stores. They spin a relational layer on top of a scalable key-value data store.

Is it necessary to use a strongly consistent transactional data store if your needs don't demand transactions, by transactions I mean 2PC. IMO you are still better off with DynamoDB/Cosmos/MongoDB for eventual consistency use cases. The reason being, you have to resort to a data model if you don't need the relational layer in YugaByte at least, not sure about Spanner. So why bother with Yugabyte if am resorting to a data model. Might as well stick with DynamoDB.


What part of SQL requires not having to design a data model? What exactly do you mean by that?

And technically all relational databases are relational layers on top of a key/value subsystem. Splitting that apart and scaling the storage is how most of the NewSQL databases scale , from CRDB to Yugabyte to Neon.


What I mean is that the data model in the NoSQL world is tightly coupled with the query pattern. So you define the query pattern and then tailor the data model to that query pattern. In the relational world, you typically choose the index based on your query pattern. Not tailor the data model to your query pattern. You follow normalization principles.

Of course, every data store needs a data model. No debate there :).

Not sure what you mean by relational databases are relational layers on top of key-value stores. InnoDB has a 16KB page data as it's fundamental data structure.

https://dev.mysql.com/doc/internals/en/innodb-page-structure...


I think VoltDb (and SciDB) are worth checking out also. I'm seeing some very impressive ACID compliant TPS with elixir connected to voltdb. I don't like having to pay to get distributed features however (open source community edition is feature gimped compared to payed.)


You seem to think MongoDB is eventually consistent. MongoDB is designed as strongly consistent database. You can choose to query a secondary and that will be eventually consistent but that is not the default behaviour.


I must have clarified, what I mean by strong consistency is that you have transactional support (2 phase commit). MongoDb is atomic, meaning every single document is strongly consistent. However, I stand corrected. They have introduced transactions as of 4.2.


I believe multi document transactions offer only causal consistency?


> What bothers me however is that these data stores are not truly relational data stores

Suggests there may be an impossibility theory lurking somewhere.


Global strict serializability is coming to Cassandra very soon [1]

[1] https://cwiki.apache.org/confluence/download/attachments/188...


One big benefit of DynamoDB over RDS on AWS is that the access layer is API based so you don’t have issues with held open connections when accessing via AWS Lambda.


RDS proxy should fix this, but the proxy team is out of sync with the RDS team. I’ve seen RDS ahead of proxy by two major versions.


An underrated part of DynamoDB are its streams. You can subscribe to changes and reliably process those in a distributed way. If you're comfortable with the terms "at-least once delivery" and "eventual consistency", you can build some truly amazing systems by letting events propagate reactively through your system, never touching a data store or messaging broker other than DynamoDB itself.

It's not for everyone, but when you get a team up and running with it, it can be shockingly powerful.


Yeah we make use of streams at my work. Really useful. You can hook up streams to a Lambda and have it process events and flow them downstream to a Data Lake or Data Warehouse for analytic workloads. What works really well is pushing data to an S3 bucket with object versioning and replication enabled.

I think dynamoDB streams and kinesis streams work similar under the hood? But dynamoDB streams are way cheaper, pricing is on-demand compared to hourly for Kinesis.


DynamoDB is (edit: can be) extremely expensive compared to alternatives (e.g. self hosted SQL).

Make sure the benifits (performance, managed, scale) outweigh the costs!


I'd put emphasis on the "can be", it very much depends on your read/write patterns and configuration. Assuming you know those up front and they fit dynamo well it can be several times cheaper than any sort of SQL.

Every time I've made use of it it's ended up costing pennies compared to what SQL would cost, sometimes literal pennies. If you turn on all the fancy features from day 1 and fill it with tons of data you don't need and make too many reads/writes per-request though you can get into very pricey territory very quickly.

We tried to aim for 1 read and/or 1 write per request to our service and that worked really well for our use cases. It kept costs low and performance high but we had a really well understood problem. If I was a startup and didn't know quite how my product would turn out, I don't think I'd consider dynamo for a while.


A word of caution. The default limit for number of tables per AWS account, for DynamoDB is 2500.

Tables are a scarce resource and you want to use single table designs for each app.

The design of tables with DDB is fascinating. Once you understand the PK / SK / GSI dance, design becomes so intuitive.


I wonder how's Cassandra doing? I heard companies are migrating away from it.


I'd like to learn more about their MemDS. Afaik nothing has been made public.


How well does DyanmoDB scale when paired with AppSync and GraphQL? The selling point here being you can use GQL as your schema for the DB too and get automatic APIs for free


i've done this. it works really, really well to start off with - your API basically is your schema, and you're done.

There's definitely more work later on when your API and data model start diverging (which they always will). Overall it was a decent experience, and DynamoDB has made some really important QOL improvements over the last 5 years, too.

It's still not relational, which means it's very different and you'll be committed to a totally different way of thinking about things for a while.


Just fine?


I should have made it clear: I was hoping to get some folks to talk to their experience using it this way. I haven't find alot in terms of real world evaluation of it.

It can also use Aurora Serverless V2, and I am curious about that as well, FWIW


Good job! But I'm wondering when Amazon can start to contribute to open source world...



ohh you are right, sorry Amazon, I didn't notice this. But I'm still hoping to see your contribution to system kernels of databases, bigdata, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: