Tank – A very high performance distributed log service

mfukar · on Dec 1, 2016

GH is now a magic land where "high performance" means "no numbers to back this up", "secure" means "we haven't actually been audited", and "awesome" something to the effect of "here, have this".

markpapadakis · on Dec 1, 2016

Some benchmarks added https://github.com/phaistos-networks/TANK/blob/master/README.... Will add consumer benchmarks later (our ops folks are on it). Apologies for not backing the "high performance" claim with numbers in the first place.

biokoda · on Dec 1, 2016

No performance numbers. No tests. Nothing about how it manages concurrency, how it actually writes to disk, how it does replication.

It appears to be a server not a lib. Why would someone use this instead of kafka?

markpapadakis · on Dec 1, 2016

It is a service, and a library(C++). You can always check the implementation, but obviously that's the wrong answer to that question :)

The README states that replication is not implemented yet. I encourage you to also check this issue https://github.com/phaistos-networks/TANK/issues/14 for some details on the I/O semantics, and the Wiki for answers to other questions: https://github.com/phaistos-networks/TANK/wiki

biokoda · on Dec 1, 2016

So single threaded, blocking file io. Not very impressive or useful (as a lib).

markpapadakis · on Dec 1, 2016

Yes, single threaded, because the contention to the various files and the cost of serialisation would likely negate the benefits of using multiple threads.

Network I/O is obviously asynchronous - you may want to check the codebase. Disk I/O is synchronous, but: - uses sendfile() and read readahead() to reduce or eliminate the likelihood of blocking reads: see here please https://github.com/phaistos-networks/TANK/issues/14#issuecom...

- AIO is either broken or supported on XFS (depending on kernel release, and also in the past, appending to a file on an XFS fs could block and degrade performance). But not using DirectI/O so writes end up in memory and only get flushed if it hits the commit count, or periodically - so, particularly for local disks, HDDs or SSDs, this in practice never blocks for more than a few ms when flushing, if at all.

biokoda · on Dec 1, 2016

> Yes, single threaded, because the contention to the various files and the cost of serialisation would likely negate the benefits of using multiple threads.

Multithreaded does not mean multiple files.

> But not using DirectI/O so writes end up in memory and only get flushed if it hits the commit count, or periodically - so, particularly for local disks, HDDs or SSDs, this in practice never blocks for more than a few ms when flushing, if at all.

Unless you are actually using it under a high throughput scenario which is why you would use a lib like this. It will work great until you hit the actual flush point, then possibly block for seconds even minutes.

If your performance needs are high, mandating XFS is not unreasonable.

krenoten · on Dec 1, 2016

It's totally unreasonable to mandate a particular FS for modern general purpose software.

Multiple files does mean multiple threads. Unless you manage to run this without a modern operating system.

biokoda · on Dec 1, 2016

> It's totally unreasonable to mandate a particular FS for modern general purpose software.

That is completely debatable. We are talking about a "a high performance distributed log service". Not a word processor.

> Multiple files does mean multiple threads.

Which is something entirely different from what I said...

the_duke · on Dec 1, 2016

You are making all kinds of assumptions without knowing anything about the implementation.

jitl · on Dec 1, 2016

They address this in the README by saying you should use Kafka.

biokoda · on Dec 1, 2016

They say tank is a good choice of you need simplicity and performance. Yet they do not provide any evidence of it being faster than Kafka or any indication of it being fast at all.

If you're not mentioning kernel calls, allocations, thread synchronization, io mechanism etc. you're not going to be very believable.

markpapadakis · on Dec 1, 2016

Please see earlier comment. Specifically re: sys calls and concurrency you may want to start from this issue https://github.com/phaistos-networks/TANK/issues/14

mfukar · on Dec 1, 2016

Starting from that, you may not want to dismiss AIO & O_DIRECT in the span of a paragraph.

Whitespace · on Dec 1, 2016

I don't see any information other than a claim that it's "engineered for optimal (very high) performance": https://github.com/phaistos-networks/TANK/wiki/Why-Tank-and-...

jedisct1 · on Dec 1, 2016

No JVM required. Way easier to set up than Kafka. I'm sold :)

virmundi · on Dec 1, 2016

Is headless install of the Oracle JVM that hard? Isn't it just a Puppet or a Chef snippet?

markpapadakis · on Dec 1, 2016

I think I, or someone else maybe?, needs to run some benchmarks and produce some meaningful comparison metrics -- as pointed out by commenters here, and elsewhere. I suppose I should have done that already. Apologies for the lack of concrete numbers. There's some information about performance in the Wiki and Issues though.

nope_42 · on Dec 1, 2016

I'm looking for something like kafka with an at least once guarantee. I believe this can be achieved with the kafka java client (not sure on that) but librdkafka (C++ client) doesn't seem to support this guarantee. Performance is secondary to messages not getting dropped in my use cases.

What kind of guarantees does tank make?

markpapadakis · on Dec 1, 2016

The Tank Client will immediately publish to Tank (it doesn't buffer requests). You get at least once semantics /w Tank(exactly once pretty much means at least once but with dupes-detection).

nope_42 · on Dec 1, 2016

So if I have a subscriber that simply publishes a transformed message onto another topic I can have a guarantee that if the publish fails it wont move on to the next message in the subscription?

markpapadakis · on Dec 1, 2016

The consumers applications (which interface with Tank brokers via a Tank client library) are responsible for maintaining the sequence number they are consuming from.

Suppose one such application is consuming every new event that's publishing ("tailing the log"). As soon as another application successfully publishes 1 or more new messages, the consumer will get them immediately. If the application that attempted to publish failed to do so, or didn't get an ACK for success, then you are guaranteed that no new message(s) were published(i.e no partial message content).

I am not sure if that answers your question, if not, can you please elaborate?

nope_42 · on Dec 1, 2016

I believe so. I suppose I'm asking for an abstraction that makes maintaining the sequence number simple and fails safely in the presence of errors.

I'd basically like to be able to map messages from one topic to another with a guarantee that none of those messages will be lost; even when some error occurs (either a programming error, system downtime, or network partitions). I'd prefer the application to stop producing messages than lose any of them.

It sounds like that is possible with Tank so I may end up giving it a try.

agibsonccc · on Dec 1, 2016

FWIW exactly once is being worked on now: https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+E...

kinkdr · on Dec 1, 2016

Looks like it is abandoned; first commit was June and last September.

markpapadakis · on Dec 1, 2016

I am the core developer. It's not abandoned at all. There are updates that haven't been pushed upstream, but no new features. Was going to support replication via an external consensus service (etc, consul, etc) -- but looks like implementing Raft directly into Tank for interfacing with other cluster nodes is a better idea, all things considering (no external deps., simplicity).

The reason this hasn't happened yet is that, other than lack of free time to pursue it, we(work) haven't really needed that feature yet. We run a few instances and they are very idle and we can also mirror to other nodes (via tank-cli).

SEJeff · on Dec 1, 2016

Can you please please please use the etcd implementation[1] of raft and not the normal go-raft or consul raft implementations? They've done some serious business fault injection and integration testing with etcd as part of google's hosted kubernetes (GKE). There are still some lingering issues with consul at scale that make me a bit gunshy. Mesosphere did some of this work themselves: https://mesosphere.com/blog/2015/10/26/etcd-mesos-kubernetes... , but I know that Google engineers have done tons of work on this as well.

[1] https://github.com/coreos/etcd/tree/master/contrib/raftexamp...

markpapadakis · on Dec 1, 2016

Thank you - yes, I was planning to base the implementation on etcd's. I appreciate the heads up:)

kinkdr · on Dec 1, 2016

Cool, thanks for clarifying!