> this seems to just be a SQLite database with values in fields?
Sqlite is used as a storage format ("SQLite competes with fopen()"). The key-value pairs are stored as a modified Append-Only CRDT. The LUB-Operation (to merge to states while syncing) is implemented here: https://github.com/maxmunzel/kvass/blob/e32fdabdc86b039f716c...
> anyone with access to the file would be able to see all data stored?
Yes, attackers with access to your fs are not part of my attacker model. I rely on disk encryption for that matter.
> Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too.
The sync mechanism is actually pretty solid, as its based on CRDTs. One of the applications of kvass is central management of config files, so automatic syncing and offline fallback are important.
> What is the purpose of the ProcessID?
The Counter Variable implements a rudimentary implementation of Lamport clocks. To get a total order from Lamport clocks, you need ordered, distinct process ids. The process id's don't really need to mean anything and the Lamport clock is itself just a fallback for the case that the wall-clock timestamps collide (see the Max() function), so it's practical to just draw them randomly.
> I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data.
Authentication is provided by the GCM mode of AES. As I decrypt (and thereby verify) early, I can assume to work on trustworthy payloads. GCM is also non-malleable unlike for example CBC or CTR.
As suggested by losfair, I'll switch to PSK TLS as soon as it's available or just put HTTPS in front of the end-points. But that's not high-priority right now.
SQLite allows me to keep multiple versions of the same entry, which is convenient for state merging. Half the sync logic is actually implemented in SQL. Other than that, I’m already familiar with it and the storage backend is not very performance critical for the intended use case.
Redis is in-memory so it's prohibitive for big files. Also kvass still works if its disconnected from the server. This is important, if you want to use it for config files.
On the other hand, using redis (/skate) for storing files was the inspiration for creating kvass.
Mainly self-hosting and generating share-able urls. If your key's end in ".html" the mime type is even set accordingly and you can use it for toy-websites ;)
This is by no means meant to replace the backend of your app. It's more of an alternative to usb-sticks and google drive.
Thanks for the critique! I wanted to use symmetric crypto as its trivial to use without domains and certificates. The possibility of replays is a non-issue, as the key-value store is implemented as a CRDT and therefore all operations are idempotent.
On the other hand, I didn't anticipate replay attacks in the design and thanks to your comment, I'll keep them in mind should I ever find myself in a scenario where they are undesirable...
It doesn't matter if the operations are idempotent. The point is that an eavesdropper can replay a message that sets a key, for example, overwriting whatever was there previously.
It would be better to use an established cryptography system. You could do self-signed certs with TLS, like Syncthing does. Or just use SSH.
If the CRDT part is done correctly, then replaying a message that sets a key will not change anything, ever.
If the message is:
Key: Foo
Reference CRDT node ID: 7654321 (the last node that the clients knows of that updated the value of ‘Foo’)
Operation: Update
Value: Bar
The ID of this new node: 1122112211
(Omitted for simplicity: Timestamps, hashes, …)
Replaying that message won’t do anything if the target already knows about the existence of that new node.
If the target didn’t know about the node, then I guess you’re helping them sync their own data? Maybe they owe you a thanks? If you knew what each encrypted message contained, you might be able to do some split-state shenanigans; for example: replay the message that sets a “PasswordAuthEnabled” key to “Yes” but deliberately omit the message that changes the “Password” key from its default of “password” to a genuine password. It’s very hard to imagine an actual situation like this occurring, but I guess that’s what makes crypto (and designing secure systems in general) so damn tricky. That and the math. And end users. And…
I see, thanks. I was focusing on the "idempotent" part but yeah a CRDT would protect against replays. Still not a great design though, still opens yourself up to issues, in case not all messages are part of the CRDT, or you have a buggy CRDT implementation.
It's a shame that the meaning of 'idempotent' has gotten watered down by half-assed implementations. The original NFS paper from Sun [0] claims that write operations are idempotent, but they aren't really. Not if another operation has occurred. Like in:
write '1' @ 0
write '2' @ 0
write '1' @ 0 (replayed through a duplicated packet)
the duplicated write RPC reverts the second write. Duplicated link and rename RPCs are even worse. They added a replay detection cache in the server later to prevent some common error cases, but it fails if the server reboots in the middle.
Anyway, CRDT correctness is hard enough that I'd be reluctant to trust it against an adversary who can inject replays.
The primary use case is for shuffling around files or clipboards between different computers. I also regularly use the url-sharing capability.
Prior, I had to deal with ephemeral http servers, which I didn't like from an ergonomic perspective.
Ergonomically, I find redis nice. The problem is, that it is in-memory and that encryption is cumbersome. Also, kvass is able to be used offline, as the kv-store is implemented as a CRDT.
> this seems to just be a SQLite database with values in fields?
Sqlite is used as a storage format ("SQLite competes with fopen()"). The key-value pairs are stored as a modified Append-Only CRDT. The LUB-Operation (to merge to states while syncing) is implemented here: https://github.com/maxmunzel/kvass/blob/e32fdabdc86b039f716c...
> anyone with access to the file would be able to see all data stored?
Yes, attackers with access to your fs are not part of my attacker model. I rely on disk encryption for that matter.
> Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too.
The sync mechanism is actually pretty solid, as its based on CRDTs. One of the applications of kvass is central management of config files, so automatic syncing and offline fallback are important.
> What is the purpose of the ProcessID?
The Counter Variable implements a rudimentary implementation of Lamport clocks. To get a total order from Lamport clocks, you need ordered, distinct process ids. The process id's don't really need to mean anything and the Lamport clock is itself just a fallback for the case that the wall-clock timestamps collide (see the Max() function), so it's practical to just draw them randomly.
> I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data.
Authentication is provided by the GCM mode of AES. As I decrypt (and thereby verify) early, I can assume to work on trustworthy payloads. GCM is also non-malleable unlike for example CBC or CTR.
As suggested by losfair, I'll switch to PSK TLS as soon as it's available or just put HTTPS in front of the end-points. But that's not high-priority right now.