Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love OpenTelemetry and we want to trace almost every span happening. We’d be bankrupt if we went any vendor. We wired opentelemetry with Java magic, 0 effort and pointed to a self hosted Clickhouseand store 700m+ span per day with a 100$ EC2.

https://clickhouse.com/blog/how-we-used-clickhouse-to-store-...



I've got a small personal project submitting traces/logs/metrics to Clickhouse via SigNoz. Only about 400k-800k spans per day (https://i.imgur.com/s0J6Mzo.png), but running on a single t4g.small with CPU typically at 11% and IOPS at 4%. I also have everything older than a certain number of GB getting pushed to a sc1 cold storage drive.

w/ 1 month retention for traces:

    ┌─parts.table─────────────────┬──────rows─┬─disk_size──┬─engine────┬─compressed_size─┬─uncompressed_size─┬────ratio─┐
    │ signoz_index_v2             │  26902115 │ 17.06 GiB  │ MergeTree │ 6.21 GiB        │ 66.74 GiB         │   0.0930 │
    │ durationSort                │  26901998 │ 5.44 GiB   │ MergeTree │ 5.40 GiB        │ 53.02 GiB         │  0.10190 │
    │ trace_log                   │ 123185362 │ 2.64 GiB   │ MergeTree │ 2.64 GiB        │ 37.96 GiB         │   0.0695 │
    │ trace_log_0                 │ 120052084 │ 2.46 GiB   │ MergeTree │ 2.45 GiB        │ 37.60 GiB         │  0.06528 │
    │ signoz_spans                │  26902115 │ 2.21 GiB   │ MergeTree │ 2.21 GiB        │ 76.73 GiB         │ 0.028784 │
    │ query_log                   │  16384865 │ 1.91 GiB   │ MergeTree │ 1.90 GiB        │ 18.31 GiB         │  0.10398 │
    │ part_log                    │  17906105 │ 846.73 MiB │ MergeTree │ 845.39 MiB      │ 3.84 GiB          │  0.21521 │
    │ metric_log                  │   4713151 │ 820.92 MiB │ MergeTree │ 806.13 MiB      │ 14.56 GiB         │  0.05405 │
    │ part_log_0                  │  15632289 │ 702.82 MiB │ MergeTree │ 701.70 MiB      │ 3.34 GiB          │  0.20490 │
    │ asynchronous_metric_log     │ 795170674 │ 576.24 MiB │ MergeTree │ 562.50 MiB      │ 11.11 GiB         │ 0.049429 │
    │ query_views_log             │   6597156 │ 461.35 MiB │ MergeTree │ 459.75 MiB      │ 6.36 GiB          │  0.07060 │
    │ logs                        │   6448259 │ 408.59 MiB │ MergeTree │ 406.65 MiB      │ 5.99 GiB          │  0.06627 │
    │ samples_v2                  │ 949110122 │ 345.01 MiB │ MergeTree │ 325.31 MiB      │ 22.09 GiB         │ 0.014382 │
If I was less stupid I'd get a machine with the recommended Clickhouse specs and save myself a few hours of tuning, but this works great.

Downsides:

- clickhouse takes about 5 minute to start up because my tiny sc1 drive has like 4 IOPS allowed

- signoz's UI isn't amazing. It's totally functional, and they've been improving very quickly, but don't expect datadog-level polish


Thanks for mentioning SigNoz, I am one of the maintainers at SigNoz and would love your feedback on how we can improve it further.

If anyone wants to check our project, here’s our GitHub repo - https://github.com/SigNoz/signoz


I hope I'm not coming across as negative! Y'all are just have a much younger product, and have not had time to do all the polish and tiny tweaks. I'm also much more familiar with Datadog, and sometimes a learning curve feels like missing features.

- I really like your new Logs & Traces Explorers. I spend a lot of time coming up with queries, and having a focused place for that is great. Especially since there's now a way to quickly turn my query into an alert or a dashboard item.

- You've also recently (6mo?) improved the autocomplete dramatically! This is awesome, and one of my annoyances with Datadog

Other feedback, and honestly this is all very minor. I'd be perfectly happy if nothing ever changed.

- where do I go see the metrics? There's no "Metrics" tab the way there's a "Logs" and "Traces" tab. A "Metrics Explorer" would be great.

- when I add a new plot, having to start out with a blank slate is not great. Datadog defaults to a generic system.cpu query just to fill something in, I find this helpful.

- when I have a plot in a dashboard and I see it is trending in the wrong direction, it would be nice to be able to create an alert directly from the chart rather than have to copy the query over.

- the exceptions tab is very helpful, but I've only recently discovered the LOW_CARDINAL_EXCEPTION_GROUPING flag. It'd be super nice if the variable part of exceptions was automatically detected and they were grouped

- once nice thing in DD is being able to preview a span from a log or logs from a span without opening a new page. Or previewing a span from the global page. Temporary popping this stuff up in a sidebar would be great.

- I'm not sure if there's a way to view only root spans in the trace viewer.

- This might be a problem with the spring boot instrumentation, but I can't see how to figure out what kind of span it is. Is it a `http.request`, `db.query`, etc?


Thanks for the detailed feedback, this is gold!

> - where do I go see the metrics? There's no "Metrics" tab the way there's a "Logs" and "Traces" tab. A "Metrics Explorer" would be great.

Great, idea. This is some thing which few users have asked for and we will be shipping this in few releases

> - when I have a plot in a dashboard and I see it is trending in the wrong direction, it would be nice to be able to create an alert directly from the chart rather than have to copy the query over.

Fair point, this is something which is also in the pipeline.

> - I'm not sure if there's a way to view only root spans in the trace viewer.

We launched a tab in the new traces explorer for this, does it not serve your use case?

> - when I add a new plot, having to start out with a blank slate is not great. Datadog defaults to a generic system.cpu query just to fill something in, I find this helpful.

We can do something like this, but we don't necessarily know the name of metrics users are sending us, wrt. DataDog which has some default. metrics which their agents generate.

Will also look into other feedbacks you have given


Oooooh this is great feedback. When I started using SigNoz I too was surprised that there was no metrics tab. I wonder what you’d expect to see after clicking it: a list of service names collecting metrics? Or some high-level system wide infra metrics? Let me know!


Are you making sure that you're doing a sample rate, but send over all errors?

At a former place, we were doing 5% of non-error traces.


Careful, we've had systems go down under increased load just emitting errors if they didn't emit much in non error state


Can you go into more detail about your comment, please.


Not the GP, but:

Imagine you're sampling successful traces at, say, 1%, but sending all error traces. If your error rate is low, maybe also 1%, your trace volume will be about 2% of your overall request volume.

Then you push an update that introduces a bug and now all requests fail with an error, and all those traces get sampled. Your trace volume just increased 50x, and your infrastructure may not be prepared for that.


Sorry been busy running around all day. Basically what's happened for us on some very high transaction per second services is that we only log errors. Or Trace errors. And the service basically never has errors. So imagine a service that is getting 800,000 to 3 million request a second. And this is happily going along basically not logging or tracing anything. Then all the sudden a circuit opens on redis and for every single one of those requests that was meant to use that open circuit to redis you log or trace an error. You went from a system that is doing basically no logging or tracing to one that is logging or tracing at 800,000 to 3 million times a second. What actually happens is you open the circuit on redis because red is a little bit slow or you're a little bit slow calling redis and now you're logging or tracing 100,000 times a second instead of zero and that bit of logging makes the rest of the requests slow down and now you're actually within a few seconds logging or tracing 3 million requests a second. You have now toppled your tracing system your logging system and the service that's doing the work. Death spiral ensues. Now the systems that are calling this system starts slowing down and start tracing or logging more because they're also only tracing or logging mainly on error. Or sadly you have a better code that assumes that the tracing are logging system is up always and that starts failing causing errors and you get into doing extra special death loop that can only be recovered from by only attempting to log or error during an outage like this and you must push to fix. All the scenarios have happened to me in production.

In general you don't want your system to do more work in a bad state. In fact as the AWS well architected guide say when you're overloaded or you're in a heavy air State you should be doing as little work as possible. So that you can recover


We've seen problems with memory usage on failure too. Python implementation sends data to the collector in a separate thread from the http server operations. But if these start failing, its configured for exponential backoff, so it can hold onto a lot of memory, and start causing issues with container memory limits.


I've configured our systems to start dropping data at this point and emit an alarm metric that logging/metrics are overloaded


I think what they means is that if you provisioned your system to receive spans for 5% of non-error requests and a few error requests, if for some random act of god, all the requests yield an error, your span collector will suddenyl receive spans for all requests.


How do you send all errors? The way tracing works, as I understand it, is that each microservice gets a trace header which indicates if it should sample and each microservice itself records traces. If microservice A calls microservice B and B returns successfully but then A ends up erroring, how can you retroactively tell B to record the trace that it already finished making and threw away? Or do you just accept incomplete traces when there are errors?


You can do head-based sampling and tail-based sampling.

With head sampling, the first service in the request chain can make the decision about whether to trace, which can reduce tracing overhead on services further down.

With tail-based sampling, the tracing backend can make a determination about whether to persist the trace after the trace has been collected. This has tracing overheads, but allows you to make decisions like “always keep errors”.


https://opentelemetry.io/docs/concepts/sampling/ describes it as Head/Tail sampling, but in practice with vendors I see it as Ingestion sampling and Index sampling. We send all our spans to be ingested, but have a sample rate on indexing. That allows us to override the sampling at index and force errors and other high value spans to always be indexed.


Maybe the Go client doesn't support that? https://opentelemetry.io/docs/instrumentation/go/sampling/


It does, but the docs aren't clear on that yet. TraceIdRatioBased is the "take X% of traces" sampler that all SDKs support today.


Normally yes, but we do a lot of data collection and identifying what's an error is usually hard because of partial errors. We also care about performance, per tenant and per resource with lots of dimensionality and sampling reduces that information for us.


The reality is that most people don't want to manage their own Clickhouse store, and not all engineers can operate with SQL as efficiently as with code (me included). Nonetheless, this is pretty cool!


> not all engineers can operate with SQL as efficiently as with code

I don’t mean for this to sound insulting but I honestly do not think this is an acceptable take to have as a developer.

Not knowing SQL is like refusing to learn any language that has classes in it, simply because you don’t like it.

I’ve heard stories of huge corporations failing product launches because some code was written to SELECT * from a database and filtering it in-app instead of doing the queries correctly, and what’s so fun with these types of issues is that they usually don’t appear until weeks later when the table has grown to a size where it becomes a problem.

When you’re saying that you’d rather find the data in-app than in-database, you’re putting the work on an inferior party in the transaction simply because you can’t be bothered.

The code will never* find the correct data faster than the database.

* there may be exceptions, but they’re far enough between to still say “never”.


Dropping down to SQL to write a really complex query is, in my professional experience, always a poor use of time. It's far simpler to just write the dumb for-loops over your data, if you can access it.

Of course not all engineers can operate with SQL as efficiently as code -- that's the whole point. Otherwise why would we be writing code? Learning SQL intimately doesn't change that fact.


> Dropping down to SQL to write a really complex query is, in my professional experience, always a poor use of time.

We’re not talking about Assembly here, “dropping down” to SQL is something that anyone should be expected to do as soon as you’re grabbing or modifying any data from a database in any scenario where performance or integrity matters. The errors you can see in situations like this are extremely complex and databases literally exist to solve them for us.

Also, if we just completely disregard the performance for a second and focus on data security instead, how do you ensure sensitive data isn’t passed to the wrong party if you don’t care about what queries are being sent?

I mean, it doesn’t matter if it’s not “in the end” displayed to an end user in the application you’re writing, or if its not stored in the intermediary node where your code is running, that data is now unnecessarily on the wire in a situation where it never should have been in the first place. If you end up mixing one customers data with another’s and sending all of it in such a way that it could even theoretically be accessed by a third party, that’s a lawsuit waiting to happen regardless of whether it was “displayed” or “forwarded” or not.

Imagine if you sniffed the packets going to some logistics app you use on your phone and you saw meta-data for all packages in your zip code in the response, or if some widget showing you your carbon footprint actually was based on a response containing the carbon footprint of every customer in the database. Even if it’s just [user_id,co2] it’s still completely unacceptable.

Never mind scenarios where you’re modifying, adding or deleting data, those are even worse and no explanation should be necessary for why.


Obviously it greatly depends on what you're doing. If you're using a relational database as a glorified key-value store for offline or batch processing of a few hundred megabytes of data, sure. Hell, just serialize and unserialize a JSON document on every run if it's small and infrequent enough ¯\_(ツ)_/¯

If you've got a successful data hungry web service with a reasonably normalized schema and moderately complex access patterns though, you're not going to be looping over the whole thing on every page load.


SQL is definitely worth learning. Recently found that processing a 350kb json is equally fast by sending it to Postgres for processing, compared to using some dedicated Java libraries: https://ako.github.io/blog/2023/08/25/json-transformations.h...

This opens some interesting options if you want to join the result with data from your database.


Way way way slower though. I just added something to our app that took 600ms for the naive ‘search and loop’ version (and kept getting slower the more items you needed, completely unscaleable) vs the 30ms for the ‘real SQL query’ version. Guess which version got actually committed.


Did your for loop solution include concurrent access by multiple clients? I highly doubt "engineers can not operate with SQL as efficiently as code" can implement anything even remotely as robust as what SQL DBMS offer even for basic use cases. Are you mutating data? What will happen if the system crashes in the middle of the mutation? How are you handling concurrent writes and reads?


It's unclear whether you mean that it's simpler to make a query and iterate over the rows to massage the result in your application or to make a query and then iterate over the returned rows to make more single-row queries. (Or perhaps some secret third thing I'm not considering.)

I'll admit I'm a little curious about what exactly you mean here.


SQL is code.


There’s a difference between writing olap and oltp sql queries. Hell, in the industry we even have a dedicated role for people who, among other things, write olap queries: data analysts. I’m assuming here that we are talking about writing complex analytical queries.


SQL is code and absolutely worth learning.


"Don't want to manage their own" has for so long been a valid excuse but cloud costs haven't been going down for so long - in many cases prices have increased - and hardware keeps getting more badass. In so many cases it's fear speaking.

A decent sized server will host a hugely capable instance that you may not have to think about for years. The scoffing down at DIY has made sense to some degree, but it just works brilliantly keeps getting to be a stronger & stronger case & most just assume reality can't actually work that well, that it'll be bad, and those folks won't always be right.


With current SSD prices a box that will have 30 million IOPS can cost you 10K. 30 mil IOPS in a cloud would be crazy $$$$


But in this case we are not even talking about own/rented HW vs cloud. It's self-hosted(even on cloud) vs SaaS softwares!

SaaS, especially in this space, can be *extremely* costly and its cost will scale up quickly as you send more traffic (either willingly or by mistake). Yes, Datadog, NewRelic etc will give you many pre-built and well-thought dashboards and some fancy AI-powered auto-detection thing but they will charge many $$$ for it. Consider that now cost management/analysis tools that were historically focused only on cloud, are now adding the same tooling for costly SaaS solutions!

I understand that many HN readers are skewed towards SaaS solutions, usually because they work at a SaaS shop, but depending on the size of the company, the overhead for managing it internally can totally be worth. There is overhead with SaaS as well...


We just left ours running for months in a Docker container. The volume is external, we just replace container image with new one, it takes 5 seconds to update, and spans are treated ephemeral. We store only 7d of data. We could use S3 but we have no use for that data in the long run.

To be fair, we wanted to get experience on ClickHouse and it's a special database need special attention to details on both ops and schema design.


I'm beginning to sound like a broken record at this point, but if you don't know SQL very well but know how to use GPT-4, you have access to enough SQL to get a lot more done than you might think.


This is really interesting, thanks for sharing. What's also cool was the low effort needed for this setup (Java autoinstrumentation + Clickhouse exporter + Grafana Clickhouse Plugin).


That's a really informative post, the ClickHouse thing sounds interesting!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: