Very cool, congrats on getting this out, I think you're hitting a sweet spot! Kafka is a heavy hammer to swing for many, and scaling/merging old school bulk streams can be a mess.
I like that somebody actually took an effort to write a log store different
than Elasticsearch, but it's a really big shame that OK Log only stores
stringblobs. With such a setup, it's not any better than writing the same logs
to gzipped text files, sharded by timestamp.
And communication protocols are not documented, so querying and filling the
data is only available through a Go-produced binary. I'd rather leave that
([edit] filling the data that comes from network) to my Fluentd instance that
is already in place and runs as a networked daemon.
It seems to me that you should be able to use your Fluentd pipeline in place of the forwarder component, I don't think there's any magic going on there. The engineering here seems to be in the ingest, replication, storage, and query side of things, none of which Fluentd addresses, really.
The problem is that instead of submitting logs using a plugin (written in
Ruby) I need to (a) spawn a separate process using (b) unrelated binary.
I should have the possibility to not jump out to a separate process and
I shouldn't need to ship binary to my log routers (a plugin in Ruby or Python
is just a text, so it's easier to distribute with configuration management).
The same thing stands for querying. Now the only possible (documented) way is
to call the binary.
As I said, this is a problem with lack of protocol specifications, not with
the architecture.