Hacker News new | past | comments | ask | show | jobs | submit | more jetter's comments login

Nice! Adding this to the post, thanks for the link!


Yep, do you guys have a writeup on this? Altinity actually mention Contentsquare case in their video, here: https://www.youtube.com/watch?t=2479&v=pZkKsfr8n3M&feature=y...


Hi

I'm of the guy who did the 2 presentations of Clickhouse at ContentSquare. There are no blog posts on the migration from ES to CH. But you can find the slides of the 2018 presentation here https://www.slideshare.net/VianneyFOUCAULT/clickhouse-meetup... And the slides of the 2019 presentation here https://www.slideshare.net/VianneyFOUCAULT/meetup-a-successf...

There is also a video recording of the 2019 presentation available here. https://www.youtube.com/watch?v=lwYSYMwpJOU nb: The video is not great because the camera is often losing focus but it's still understandable.


I'm not sure there is a public writeup. I know that the incredibly talented guy who created the first CH setup at CS planned to write a more global post about data analytics at scale, but after 2 years I still wait for it


I stopped answering to people about the release date of my next blog post because I'm always postponing it ;-).

But don't worry Paul, the day I'll release it you'll be one of the first to be informed.


I'll remind him about the post ;)


I address your concern from #1 in "2. Flexible schema - but strict when you need it" section - take a look at https://www.youtube.com/watch?v=pZkKsfr8n3M&feature=emb_titl...

Regarding #2: Clickhouse scalability is not simple, but I think Elasticsearch scalability is not that simple, too, they just have it out of the box, while in Clickhouse you have to use Zookeeper for it. I agree that for 200 nodes ES may be a better choice, especially for full text search. For 5 nodes of 10 TB logs data I would choose Clickhouse.

#3 is totally true. I mention it in "Cons" section - Kibana and ecosystem may be a deal breaker for a lot of people.

#4. Clickhouse in 2021 has a pretty good support in all major languages. And it can talk HTTP, too.


Hi! Assuming you are author of the PixelJets article would you consider submitting a talk to the Percona Live Online 2021 Conference? It's all about open source databases. We're doing an analytics track and welcome submissions on any and all solutions based on open source analytic databases. CFP runs through 14 March.

p.s., Everyone is welcome! If you see this and have a story please consider submitting. No marketing please. We are DBMS geeks.

https://altinity.com/blog/call-for-papers-on-analytics-at-pe...


Thank your for the invitation! Will definitely consider submitting my story.


I am doing exactly this in my current project - I put server logs of API services to column-based database with extensive SQL support, and it's feeling great. SQL allows me to build great analytics dashboard in a simple dialect that I know, and in terms of performance it runs circles around ELK stack (short animated GIF preview of UI is available on https://apiroad.net/ )


ApiRoad is an interesting project, are you running this all by yourself?

In any case, do you have a write-up somewhere of how you use a columnar DBMS to slurp your server logs?


Thank you! Yes this is a big solo project now. Didn't have time to do a write up yet. I am a big fan of Clickhouse, so I use it for log storage. To collect server logs, it is not exactly a simple file parsing setup - I have a gateway daemon that stays in front of the API server. The daemon receives all the requests from subscribers, authorizes that this specific connection has access to specific API, rate limits, and proxies all the connections to upstream API, dumping HTTP logs to Clickhouse once in a while. Then Vue.js & Laravel powered dashboard queries Clickhouse to generate various stats, which is later used for analytics and usage-based billing.


Well, good performance of http request/response cycle is mostly related to async operation, in some form, when long-lived daemon can process big amount of requests without spending time on initialization. Workerman and swoole, powered by async PHP approach, are very fast and suitable for high load. The problem with async PHP, though, is that you need to be careful with libraries (e.g. mysql client) because most of them were designed with short-lived PHP script execution in mind.


Clickhouse is a good (self hosted) alternative to Elasticsearch for log storage: it saves a lot of space due to better compression, it supports sql (with regex search instead of useless by-word indexing), and ingestion speed is great.


> (with regex search instead of useless by-word indexing)

Perhaps I misunderstand your situation, but I don't see any "CREATE INDEX" available in Clickhouse, and thus won't "SELECT * FROM logs WHERE match(message, '(?i)error.*database')" require a full column-scan (including, as you mentioned, decompressing it)? Versus the very idea of an indexer like ES is "give me all documents that have the token 'ERROR' and the token 'database'" which need not tablescan anything

I only learned about the project 9 minutes ago so any experiences you can share about the actual performance of those queries would be enlightening -- maybe it's so fast that my concern isn't relevant


Clickhouse is designed for full table scans. It allows one index per table, usually a compound key including the date as the leftmost part of the key. This allows it to eliminate blocks of data that don’t contain relevant time ranges. It is also a column store, so the data being read is only the columns used in the query.

If your query is linearly scalable conceptually, Clickhouse is also linearly scalable. Per core performance is also pretty good. (tens of millions of rows per second on good hardware and simple queries, like most log aggregation queries are)


Clickhouse (like any other SQL DB) would work great if you could chop up your log files into fields and store one type per DB. ElasticSearch is great for this because you don't have to worry about schema- ClickHouse you will... unless you do Two Arrays. One for the field type, and one for the field value.

If you value being able to store arbitrary log files, ClickHouse is not for you. If you want to build your system to generate tables on the fly- ClickHouse might work.

See: https://github.com/flant/loghouse


You can use materialized views in ClickHouse to simulate secondary indexes. See https://www.percona.com/blog/2019/01/14/should-you-use-click... for an example of this usage. It's about half-way through the article.

Disclaimer: I work for Altinity which is commercializing ClickHouse.


PWA is not good enough for us due to heavy usage of mobile device camera and picture resizing (client side) for package receiving process in the warehouse. PWA fails to provide consistent performance in such conditions, and issues with bluetooth scanners in Safari/PWA on iOS devices add up. Other than that, I agree Vue.js is superior to React in a lot of use cases, especially for web.


I agree. We still admire Vue.js after using it for more than a year and writing hundreds of lines of Vue.js code every day. No chance I would choose React+Redux in cases when I know I need quick progress on my web project - Vue.js allows you to be very productive, cutting corners in the right places, while still keeping the reasonable code quality, and writing very little of boilerplate.


MobX+React would be just like Vue, but with the benefits of the larger React ecosystem and full type checking with typescript/flow, and the drawback of mandatory build tooling.


It is similar, but it will mean you get off the highway of React "preferred way" - for example, in case you start working with React Native you might have an issue of using different state management solutions for your web and mobile stack - it looks like nobody is using Mobx with RN in serious projects.


Actually we use React+mobx on both our web and RN stacks and it's been great. What we really like is that it is simple and extremely performant since in most cases it automatically updates only the React components that need to be re-rendered based on the state that's been modified.

We jumped directly from a flux-like state management to mobx (skipping Redux) and the development time has gone down significantly as well.


Cool! I think you guys should do a writeup since "react native mobx" results in google clearly state the mind share is not there yet.


Chiming in with another "we use mobx and react native." Sorry for the overload of comments like this but I think it's perfectly fine.


There is no react preferred way of state management. I have used mobx in many of my reactjs native projects.


I've used MobX for React Native (though I went back to Redux for familiarity). I found it to be transparent enough.


We're using RN with mobx and it has been great. We'd probably have 3x more code if we went with redux.


i've been wondering about that too, and would love to see some write-ups of people that have react+mobx for at least a few months if not more, how it worked out for them, plusses and minuses.

Also tutorials for react+mobx.


React Native does not generate any HTML. It generates native controls.


Vue.js is just a better option for every-day development in smaller and mid-sized teams, it gives more freedom working with arbitrary html, which is huge, it also gives easy start - you don't need compiler to use Vue across your legacy codebase. React is a good thing if you are a hard-core fulltime frontend dev in a big team, I guess. That's why potential of Vue.js popularity is ~25-30% of jQuery worldwide usage while React will probably might get 5-10% at most - that's just my impression after using both React and Vue. http://pixeljets.com/blog/why-we-chose-vuejs-over-react


I'm assuming in the web dev world "legacy codebase" is used very loosely here.

In my company, "legacy codebase" is referring to C code from the 70s that is still ticking today.


"Legacy codebase" generally refers to anything that doesn't an meet an organization's current standards for new development, especially to the extent that that limits maintenance efforts.

That standards may move faster in web dev doesn't make the use of the term any looser there.


That's right, in the web dev world "legacy codebase" is 3 or 4 years old, created by a previous team who all quit once they finished the rewrite of the previous "legacy codebase".


3 or 4 years? Seems generous. Lately it seems anything more than a year old is "legacy" or "technical debt" because the team is now using other frameworks or libraries.


In my experience there's no time limit for "legacy". It pretty just means "code we have to maintain that we would like to replace" these days.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: