The future of kdb+?

lordnacho · 2024-08-03T09:57:24 1722679044

I thought I'd throw in TimeScale. It's a postgres extension, so all your SQL stuff is just the same (replication, auth, etc).

It's also a column store, with compression. Runs super fast, I've used it in a couple of financial applications. Huge amounts of tick data, all coming down to your application nearly as fast as the hardware will allow.

Good support, the guys on Slack are responsive. No, I don't have shares in it, I just like it.

Regarding kdb, I've used it, but there are significant drawbacks. Costs a bunch of money, that's a big one. And the language... I mean it's nice to nerd out sometimes with a bit of code golf, but at some point you are going to snap out of it and decide that single characters are not as expressive as they seem.

If your thing is ad-hoc quant analysis, then maybe you like kdb. You can sit there and type little strings into the REPL all day in order to find money. But a lot of things are more like cron jobs, you know you need this particular query run on a schedule, so just turn it into something legible that the next guy will understand and maintain.

andrewl-hn · 2024-08-03T20:39:34 1722717574

Another superpower of TimeScale is that it plays nicely with other Postgres extensions. We had a really good experience with it combined with PostGIS. Scenarios like "Show sensors on a map with value graphs for each sensor" can be done in a single query, and it's fast and beautiful.

BWStearns · 2024-08-03T16:50:44 1722703844

I am using Timescale at work and I like them a lot. If your data is nicely structured it's a breeze. But my data is kind of pathological (source can just change the structure and I gotta put up with it), so I'd honestly use Influx in a heartbeat if their pricing wasn't totally insane.

Labo333 · 2024-08-03T03:49:02 1722656942

I actually quit a quant trading job after 2 weeks because they used kdb+. I could use it but the experience was so bad...

People could complain about abysmal language design or debugging but what I found the most frustration in the coding conventions that they had (or had not), and I think the language and the community play a big role there. But also the company culture: I asked why the code was so poorly documented (no comments, single letter parameters, arcane function names). "We understand it after some time and this way other teams cannot use our ideas."

Overall, their whole stack was outdated and ofc they could not do very interesting things with a tool such as Q. For example, they plotted graphs by copying data from qStudio to Excel...

The only good thing was they did not buy the docker / k8s bs and were deploying directly on servers. It makes sense that quants should be able to fix things in production very quickly but I think it would also make sense for web app developers not to wait 10 minutes (and that's when you have good infra) to see a fix in production.

I have a theory on why quants actually like kdb: it's a good *weapon*. It serves some purpose but I would not call it a *tool* as building with it is tedious. People like that it just works out of the box. But although you can use a sword to drive nails, it is not its purpose.

Continuing on that theory, LISP (especially Racket) would be the best *tool* available as it is not the most powerful language out of the box but allows to build a lot of abstractions with features to modify the language itself. C++ and Python are just great programming languages as you can build good software on them, Python being also a fairly good weapon.

Q might give the illusion of being the best language to explore quant data, but that's just because quants do not invest enough time into building good software and using good tools. When you actually master a Python IDE, you are definitely more productive than any Q programmer.

And don't get me started on performance (the link covers it anyway even though the prose is bad).

wenc · 2024-08-03T07:02:36 1722668556

The article calls out Python and DuckDB as possible successors.

I remember being very impressed by Kdb+ (went to their meetups in Chicago). Large queries ran almost instantaneously. The APL like syntax was like a magic incantation that only math types were privy to. The salesperson mentioned KdB was so optimized that it fit in the L1 cache of a processor of the day.

Fast forward 10 years. I’m doing the same thing today with Python and DuckDB and Jupyter on Parquet files. DuckDB not only parallelizes, it vectorizes. I’m not sure how it benchmarks against kdb+ but the responsiveness of DuckDB at least feels as fast as kdb+ on large datasets. (Though I’m sure kdb+ is vastly more optimized). The difference? DuckDB is free.

singhrac · 2024-08-03T07:22:21 1722669741

We use DuckDB similarly but productionize by writing pyarrow code. All the modern tools (DuckDB, pyarrow, polars) are fast enough if you store your data well (parquet), though we work with not quite “big data” most of the time.

It’s worth remembering that all the modern progress builds on top of years of work by Wes McKinney & co (many, many contributors).

wenc · 2024-08-03T14:21:33 1722694893

Yes Wes McKinney was involved in both Pandas and Parquet and Arrow.

7thaccount · 2024-08-04T21:45:31 1722807931

I remember reading something awhile back that when building pandas he was getting a lot of inspiration from things like APL and I assume Kdb+.

wenc · 2024-08-03T17:15:18 1722705318

I just realized all the data tools I use are animals.

Pandas

Polars (polar bear)

DuckDB

Python

cout · 2024-08-03T09:19:17 1722676757

Do you use duckdb for real-time queries or just historical? You mentioned parquet but afaik it's not well suited for appending data.

wenc · 2024-08-03T14:41:15 1722696075

Also a tip: for interactive queries, do not store Parquet in S3.

S3 is high-throughput but also high-latency storage. It's good for bulk reads, but not random reads, and querying Parquet involves random reads. Parquet on S3 is ok for batch jobs (like Spark jobs) but it's very slow for interactive queries (Presto, Athena, DuckDB).

The solution is to store Parquet on low-latency storage. S3 has something called S3 Express Zones (which is low-latency S3, costs slightly more). Or EBS, which is block storage that doesn't suffer from S3's high latency.

eismcc · 2024-08-03T16:26:56 1722702416

You can do realtime in the sense that you can build Numpy arrays in memory from realtime data and then use these as columns in DuckDb. This is approach I took when designing KlongPy to interop array operations with DuckDb.

wenc · 2024-08-03T14:13:32 1722694412

Not real time, just historical. (I don’t see why it can’t be used for real time though... but haven’t thought through the caveats)

Also, not sure what you mean by Parquet is not good at appending? On the contrary, Parquet is designed for an append-only paradigm (like Hadoop back in the day). You can just drop a new parquet file and it’s appended.

If you have 1.parquet, all you have you to do is drop 2.parquet in the same folder or Hive hierarchy. Then query>

  Select * from ‘*.parquet’

DuckDB automatically scans all the parquet in that directory structure when it queries. If there’s a predicate, it uses Parquet header information to skip files that don’t contain the data requested so it’s very fast.

In practice we use a directory structure called Hive partitioning, which helps DuckDB do partition elimination to skip over irrelevant partitions, making it even faster.

https://duckdb.org/docs/data/partitioning/hive_partitioning

Parquet is great for appending!

Now, it's not so good at updating because it's a write-once format (not read-write). To update a single record in a Parquet file entails regenerating the entire Parquet file. So if you have late-arriving updates, you need to do extra work to identify the partition involved and overwrite. Either that or use bitemporal modeling (add data arrival timestamp [1]) and do a latest date clause in your query (entailing more compute). If you have a scenario where existing data changes a lot, Parquet is not a good format for you. You should look into Timescale (time-series database based on Postgres)

[1] https://en.wikipedia.org/wiki/Bitemporal_modeling

belfthrow · 2024-08-03T18:11:29 1722708689

Not surviving more than 2 weeks in a QF role because of kdb, and then suggesting they should rewrite everything to LISP is one of the more HN level recidivous comments I think I have ever seen.

dumah · 2024-08-03T13:08:00 1722690480

You didn’t learn Q in two weeks to the extent that you are qualified to assert that someone who knows how to use a Python IDE is more productive than a quant dev with decades of experience.

I find it much more likely that you couldn’t understand their code and quit out of frustration.

If you were a highly skilled quant dev and this was a good seat, quitting after two weeks would have been a disaster to manage the next transition given the terms these contracts always have.

Jorge1o1 · 2024-08-03T06:52:20 1722667940

Their pykx integration is going a long way to fix some of the gaps in:

- charting

- machine learning/statsmodels

- html processing/webscrapes

Because for example you can just open a Jupyter Notebook and do:

  import pykx as kx
  df = kx.q(“select from foo where bar”)
  plt.plot(df[“x”], df[“y”])

It’s truly an incredibly seamless and powerful integration. You get the best of both worlds and it may be the saving feature of the product in the next 10 years

nivertech · 2024-08-03T13:30:26 1722691826

I think this will only work with regular qSQL on a specific database node, i.e. RDB, IDB, HDB[1]. It will be much harder for a mortal Python developer to use Functional qSQL[2] which will join/merge/aggregate data from all these nodes. The join/merge/aggregation is usually application-specific and done on some kind of gateway node(s). Querying each of them is slightly different, with different keys and secondary indices, and requires using a parse tree (AST) of a query.

---

[1] RDB - RAM DB (recent in-memory data), IDB (Intraday DB - recent data which doesn't fit into RAM), HDB - Historical DB (usually partitioned by date or other time-based or integral column).

[2] https://code.kx.com/q/basics/funsql/

Jorge1o1 · 2024-08-03T13:50:04 1722693004

That’s accurate enough. I think the workflow was more built for a q dev occasionally dipping into python rather than the other way around.

I think you touch on something really interesting which is the kink in the kdb+ learning curve when you go from really simple functions,tables, etc. to actually building a performant kdb architecture.

qkdb1 · 2024-08-08T10:32:37 1723113157

Will be interesting to see what comes of some of the things that are being put on their roadmap https://code.kx.com/pykx/2.5/roadmap.html#upcoming-changes seems to be moving in a direction of an API similar to Polars

keithalewis · 2024-08-03T07:26:39 1722669999

[flagged]

eru · 2024-08-03T08:12:39 1722672759

It's not a good filter in that case. I can learn obscure languages just fine, but that doesn't make me any more pleasant to hang out with.

socksy · 2024-08-03T10:22:32 1722680552

I'm not sure that was ever a requirement in these industries

keithalewis · 2024-08-04T04:18:53 1722745133

It is not a requirement. Just a way to weed out people who think they are special snowflakes.

eru · 2024-08-04T23:27:53 1722814073

I'm perfectly capable of learning obscure language _and_ thinking I'm a special snowflake. (In fact, I'm a special snowflake _because_ I am into weird languages.)

RodgerTheGreat · 2024-08-03T02:04:23 1722650663

One of the compelling features of kdb+/Q that isn't explicitly called out here is vertical integration: it's a single piece of technology that can handle the use-cases of a whole stack of other off-the-shelf technologies you'd otherwise need to select and glue together. The Q language, data serialization primitives, and IPC capabilities allow a skilled programmer to tailor-build exactly the system you need in one language, often in a codebase that would fit on a few sheets of paper instead of a few hundred or thousand.

If your organization has already committed to serving some of these roles with other pieces of software, protocols, or formats, the benefits of vertical integration- both in development workflow and overall performance- are diminished. When kdb+ itself is both proprietary and expensive it is understandably difficult to justify a total commitment to it for new projects. It's a real shame, because the tech itself is a jewel.

absurdcomputing · 2024-08-03T02:39:24 1722652764

I agree that the vertical integration capability of kdb+/Q is amazing, and it is beyond comprehension why Kx themselves don’t effectively leverage it. Kx Platform appears to be mostly written in Java, and the API’s callable from Q are not very well documented. My team and I find the dashboards product is difficult to use, and there are some nasty bugs that cause frequent editor crashes for dashboards of moderate complexity. Q is so feature rich that it would be a blast to write web applications in, but instead we’re forced to use this drag and drop editor if we want to make something available to our users.

I think Shakti could become a viable competitor to Kx if they included libraries that handle some common enterprise usecases, such as load balancing, user permissions and SSO. I have no doubt that an experienced K programmer could whip this up in a week or two, but in my experience a sufficiently large enterprise will specify that all these capabilities need to be implemented before they let the product in the door.

RodgerTheGreat · 2024-08-03T02:47:40 1722653260

I'm a little too close to be throwing stones, but without going into specifics I believe that key leaders at Kx do not properly appreciate the unique characteristics and benefits of their own technology, and are trapped in a mindset of trying to make their products more similar to their competition in order to make sales and marketing easier. In the process, they discard their competitive advantage. Tale as old as time.

plorkyeran · 2024-08-03T05:40:03 1722663603

I think it is very difficult to judge how much of an advantage your competitive advantage actually is. It’s very easy to look at the things which directly cost you sales and conclude that those are the things you need to fix rather than doubling down on your strengths. The most common way to avoid that is to go too far in the other direction and become convinced that your niche technology is vastly superior to the mainstream choice and anyone who rejects you for your shortcomings is just shortsighted and wrong.

From the outside it’s always seemed that kdb fans tend to land in the second camp, and I think it would be understandable for Kx to have overcorrected into undervaluing their work instead.

mbroecheler · 2024-08-03T03:14:21 1722654861

I agree that being able to write one piece of code that solves your use case is a big benefit over having to cobble together a message queue, stream processor, database, query engine, etc.

We've been playing around with the idea of a building such an integration layer in SQL on top of open-source technologies like Kafka, Flink, Postgres, and Iceberg with some syntactic sugar to make timeseries processing nicer in SQL: https://github.com/DataSQRL/sqrl/

The idea is to give you the power of kdb+ with open-source technologies and SQL in an integrated package by transpiling SQL, building the computational DAG, and then running an cost-based optimizer to "cut" the DAG to the underlying data technologies.

gricardo99 · 2024-08-03T02:16:45 1722651405

   Get a free version out there that can be used for many things…

I think this has been the biggest impediment to kdb+ gaining recognition as a great technology/product and growing amongst the developer community.

Having used kdb+ extensively in the finance world for years, I became a convert and a fan. There’s an elegance in its design and simplicity that seems very much rooted in the Unix philosophy. After I left finance, and no longer worked at a company that used kdb+, I often felt the urge to reach for kdb+ to use for little projects here and there. It was frustrating that I couldn’t use it anymore, or even just show colleagues this little known/niche tool and geek out a little on how simple and efficient it was for doing certain tasks/computations.

jcul · 2024-08-03T18:31:16 1722709876

Isn't there a free version or something?

I had to write some C++ code in the past to send data into kdb and also a decoder for their wire protocol. For both I definitely had a kdb binary to test against.

I just needed to test against it. Maybe Kx gave us a development license or something, it was a good few years ago.

7thaccount · 2024-08-04T21:47:19 1722808039

They do have a free version for non-commercial work.

shrubble · 2024-08-03T02:58:23 1722653903

Were any of the open source versions such as ngn/k or Kerf etc. usable for you?

RodgerTheGreat · 2024-08-03T03:04:58 1722654298

Kerf1 has only been open source for a fairly short time, and prior to that it was proprietary. ngn/k is tremendously less feature-rich than Q/k4, has some built-in constraints that make building large programs difficult, and does not come with the "batteries included" necessary for building distributed systems. Neither is currently a credible alternative to kdb+ for production environments.

thaufeki · 2024-08-03T17:16:18 1722705378

Well you would have to know how to code in k, not just q, the syntax is a lot more terse and there are a lot of features missing

chrisaycock · 2024-08-03T03:00:08 1722654008

I agree with everything in this article. If you're building from scratch, just store your data in Parquet and access it via Polars or DuckDB.

I built my own language for time-series analysis because of how much I hated q/kdb+, but Python has been the winner for a bunch of years now.

anonu · 2024-08-03T11:45:14 1722685514

I built a (moderately successful) startup using kdb+. It was what I knew and it helped us build robust product, quickly. But as we scaled we had to rewrite in FOSS to ensure we could scale the team.

Agree with all the recommendations, except I think kx should open source the platform. This will attract the breed of developer that will want to contribute back to the ecosystem with improvements and tools.

mritchie712 · 2024-08-03T11:58:39 1722686319

What was the startup? What FOSS did you move to?

7thaccount · 2024-08-03T02:05:04 1722650704

Kdb+ seems really cool and I've learned it a little bit for fun along with APL. It would actually be pretty cool for a lot of uses in my industry too, but the price is just crazy. We can't pay like $100k/cpu or whatever it is that the financial banks pay. So they've basically ignored a HUGE amount of potential customers.

coliveira · 2024-08-03T02:35:34 1722652534

They found a niche that can pay the price to have an innovative product. I believe they did the right thing, after all it is not a product trying to solve all problems in the world. Other people could learn from their techniques and do the same for other areas and languages.

7thaccount · 2024-08-03T03:33:00 1722655980

Not quite where I was going. The product does seem to be good and there is demand for it in many industries I'd think, but instead of using discriminatory pricing and having people pay less that have a much lower ability to pay, they just ignore the segment entirely. Maybe they know what they're doing though. It's a shame I don't get to use it at work

RodgerTheGreat · 2024-08-03T02:42:19 1722652939

Semiconductor manufacturers understand that giving free samples of their chips to hobbyists creates an environment that breeds future sales: if 1 out of the 1000 people they mailed samples uses their chip in the design for a commercial product, they come out ahead.

Proprietary programming languages that are inconvenient for hobbyists to obtain- any more friction than cloning a git repo or installing via a package manager- have stunted open-source ecosystems, and in turn limited opportunities for grass-roots adoption.

zX41ZdbW · 2024-08-03T01:25:10 1722648310

A few corrections to the article.

1. ClickHouse is not a new technology — it has been open-source since 2016 and in development since 2009.

2. ClickHouse can do all three use cases: historical and real-time data, distributed and local processing (check clickhouse-local and chdb).

3. ClickHouse was the first SQL database with ASOF JOIN in the main product (in 2019) - after kdb+, which is not SQL.

benjaminwootton · 2024-08-03T04:22:21 1722658941

I run a data consultancy with a big focus on ClickHouse. There is a lot of interest in replacing KDB with it. I’ve had probably 10 conversations with companies looking at a migration.

Tellingly, nobody has pulled the trigger on a migration yet as I think it’s a big call with all of the integrations that KDB sprouts, but it definetly feels like the spiritual successor.

fnordpiglet · 2024-08-03T02:08:32 1722650912

3 is a point that’s lost on people who use Q and related things for financial calculations. They picked kdb+ for a reason, and it wasn’t the database. I took that as the point of the post.

haolez · 2024-08-03T03:37:38 1722656258

Is it still possible to learn from scratch and make big bucks developing for kdb+ (k/q)? I remember seeing an open position a few years ago which paid like 1MM per year. Astounding.

puzpuzpuz-hn · 2024-08-03T10:57:23 1722682643

Nice article, thanks for sharing it. It's a pity kdb+ has a DeWitt Clause, so that no one can benchmark it against other databases from the article. I wonder if they have any public benchmarks held by a 3rd-party.

timkpaine · 2024-08-03T01:47:36 1722649656

There are certainly enough rubes out there to sell the next KDB+ to: https://shakti.com/

parentheses · 2024-08-03T03:23:27 1722655407

I feel kdb is like the equivalent of a drag racer - useless generally. Great at a one (or few) things in very limited environments.

zitterbewegung · 2024-08-03T13:06:35 1722690395

Even if Python has “won” in the space the current inertia of technical debt or it isn’t not broken so why fix it will be an issue. I have 5+ years of Python experience and migration to a new platform is at least a year long project if not multi year.

Greenfield development though would use Python.

thestgman1 · 2024-08-13T18:51:07 1723575067

KDB is an absolute nightmare, a barbaric piece of tech that should have never existed.

Here is a link on how you do queries: https://code.kx.com/q/basics/funsql/

TL;DR;

This is a select: q)t:([] c1:`a`b`a`c`a`b`c; c2:101+til 7; c3:1.11+til 7)

And this is another select: q)?[t; ((>;`c2;35);(in;`c1;enlist[`b`c])); 0b; ()]

Mind that these are the basic queries :)))))

The future of kdb+ is in the toilet.

robinzfc · 2024-08-14T07:09:05 1723619345

The first one is not a select but syntax for defining a small in-memory table named t. You can then do a select on this table. The second is a "functional form" of select i.e. an alternative syntax for select with extended capabilities. It is an advanced feature that is rarely used, but it's there when you need it "for programmatically-generated queries, such as when column names are dynamically produced". Written in the usual syntax this particular expression is the same as "select from t where c2>35,c1 in `b`c".

marsvenus · 2024-08-04T20:15:22 1722802522

Sadly the management at kx/fd didn’t have the vision to push this product beyond being a boutique platform for a handful of rich finance firms and their moment has passed

menthe · 2024-08-03T02:51:02 1722653462

Not 100% sure why it’s often idolized on HN.

We’ve maintained a financial exchange w/ margining for 8 years with it, and I guarantee you that everyone was more than relieved - customers and employees alike, once we were able to lift and shift the whole thing to Java.

The readability and scalability is abysmal as soon as you move on from a quant desk scenario (which everyone agrees, it is more than amazing at.. panda and dask frames all feel like kindergarten toys compared), the disaster recovery options are basically bound to having distributed storage - which are by the way “too slow” for any real KDB application given the whole KDB concept marries storage and compute in a single thread.. and use-cases of data historical data, such as mentioned in the article, become very quickly awful: one kdb process handles one request at once, so you end up having to deploy & maintain hundreds of RDB keeping the last hour in memory, HDBs with the actual historical data, pausing for hourly write downs of the data, mirroring trees replicating the data using IPC over TCP from the matching engine down to the RDBs/HDBs, recon jobs to verify that the data across all the hosts.. Not to mention that such a TCP-IPC distribution tree with single threaded applications means that any single replica stuck down the line (e.g. big query, or too slow to restart) will typically lead to a complete lockup - all the way to the matching engine - so then you need to start writing logic for circuit breakers to trip both the distribution & the querying (nothing out of the box). And then at some point you need to start implementing custom sharding mechanisms for both distribution & querying (nothing out of the box once again..!) across the hundreds of processes and dozens of servers (which has implications with the circuit breakers) because replicating the whole KDB dataset across dozens of servers (to scale the requests/sec you can factually serve in a reasonable timeframe) get absolutely batshit crazy expensive.

And this is the architecture as designed and recommended by the KX consultants that you end up having to hire to “scale” to service nothing but a few billions dollars in daily leveraged trades.

Everything we have is now in Java - all financial/mathematical logic ported over 1:1 with no changes in data schema (neither in house neither for customers), uses disruptors, convenient chronicle/aeron queues that we can replay anytime (recovery, certifying, troubleshooting, rollback, benchmarks, etc), and infinitely scalable and sharded s3/trino/scylladb for historical.. Performance is orders of magnitude up (despite the thousands of hours micro-optimizing the KDB stack + the millions in KX consultants - and without any Java optimizations really), incidents became essentially non-existent overnight, and the payroll + infra bills got also divided by a very meaningful factor :]

parentheses · 2024-08-03T03:21:40 1722655300

I think the adulation is mainly driven by the a few things:

1. it was fast by a huge margin for its time

2. the reason for its speed is the language behind it

3. it uses an esoteric language and still attains success

4. the core engine is implemented using surprisingly few lines of code

5. the core has been written and maintained by one person

All of these are things I've heard so I can't claim it's 100% true but I'm sure it's a combination of some of these.

I feel like APL and all its relatives had long ago gained legendary status. So the legend lives on - maybe longer than it should.

Don't get me wrong. It's still amazing!

RodgerTheGreat · 2024-08-03T03:47:54 1722656874

Compared to similar dynamic scripting languages, Q is very vast. Compared to statically compiled languages, it can be surprisingly competitive, but is usually slower. The truly distinctive thing about Q is its efficiency as a user interface: at a REPL you can rattle off a short sequence of characters to transform and interrogate large datasets at interactive speeds and flexibly debug complex distributed systems live. In the right hands, it's a stunningly effective rapid-application-development tool (the above "quant desk scenario"); this was perhaps even more true in the k2 days when it was possible to build ugly but blisteringly fast and utilitarian data-bound GUIs for K programs in a few lines of code. There's certainly an abundance of romanticism and mythology surrounding it, but some of the claims are real and enduringly unmatched.

benjaminwootton · 2024-08-03T04:29:19 1722659359

Python in a Notebook is “REPL like” and much more modern.

And though I agree low code is important, Streamlit or Dash are a much more fully featured and open way to do that.

I agree KDB has a good development workflow, but I think the same is available in an open source stack like ClickHouse + Python + Jupyter.

gricardo99 · 2024-08-03T18:45:35 1722710735

   And this is the architecture as designed and recommended by the KX consultants that you end up having to hire to “scale”

I think this hits on one of the major shortcomings of how FD/Kx have managed the technology going back 15+ years, IMHO.

Historically it’s the consultants that brought in a lot of income, with each one building ad-hoc solutions for their clients and solving much more complicated enterprise-scale integration and resilience challenges. FD/Kx failed to identify the massive opportunity here, which was to truly invest in R&D and develop a set of common IP, based on robust architectures, libraries and solutions around the core kdb+ product that would be vastly more valuable and appealing to more customers. This could have led to a path where open sourcing kdb+ made sense, if they had a suite of valuable, complementary functionality that they could sell. But instead, they parked their consultants for countless billable hours at their biggest paying customer’s sites and helped them build custom infra around kdb+, reinventing wheels over and over again.

They were in a unique position for decades, with a front row seat to the pain points and challenges of top financial institutions, and somehow never produced a product that came close to the value and utility of kdb+, even though clearly it was only ever going to be a part of a larger software solution.

In fairness they produced the delta suite, but its focus and feature set seemed to be constantly in flux and underwhelming, trying to bury and hide kdb+ behind frustratingly pointless UI layers. The more recent attempts with Kx.ai I’m less familiar with, but seem to be a desperate marketing attempt to latch onto the next tech wave.

They have had some very talented technical staff over the years, including many of their consultants. I just think that if the leadership had embraced the core technology and understood the opportunity to build a valuable ecosystem, with a goal towards FOSS, things could look very different. All hindsight of course :)

Maybe it’s not too late to try that…

cheikhcheikh · 2024-08-03T03:55:19 1722657319

I'm very curious about this rewrite in Java, especially the orders of magnitude improvement. That sounds extremely impressive, and something that I wouldn't have considered possible. Can you share a bit more about how this performance improvement is achieved?

dotsch · 2024-08-09T13:52:26 1723211546

Well, I don't think the founders of that exchange complain about KDB that much. After all KDB allowed them to go to market quickly and make billions and than they changed the tech stack when demand justified it. So what? KDB was never meant to run a large exchange, but you just demonstrated that it can run a smaller one.

> ... and without any Java optimizations really ...

Come on, be honest! All of the core tech needs to be implemented in highly optimized GC-free Java. And you need to hire senior Java consultants who are highly specialized and do that for 10+ years and they also cost millions. I happen to know that BitMEX (located in Asia) has such consultants working from the EU. So, it's that easy to hire them!

kanungle · 2024-08-14T16:44:31 1723653871

Full disclosure: I work for KX. In fact, my job is to connect with developers to learn about their experience with KX, so I can help to make it better. I am always open to feedback about what we can improve, and while no product is perfect, I think there’s a lot in this blog that’s worth addressing.

For benchmarks, I would check out STAC M3... kdb+ holds 17 world records there and that is something we’re proud of. The Clickbench benchmarks cited in the article, however, aren’t designed for time series databases and kdb+ isn’t included (probably for that reason). I don’t think it’s relevant here. We also think that speed – and performance in general – is still important to our customers, as they continue to affirm.

As far as accessibility is concerned, I’d like to address in multiple parts:

1) We are invested in creating cloud-native features that are more appealing for smaller firms

2) q is the best language out there (in our opinion) but we also offer a path for Python (including Polars) and SQL developers, which is essential to expanding the kdb+ userbase to the maximum extent. Our entire Fusion interfaces was built to enable more interoperability. We also don’t mandate language lock-in... there is nothing preventing other languages from being used with kdb+.

3) Pricing—this comes up a lot. We already offer a free edition of kdb+ for non-commercial use that is very popular. We recognize there’s more we can do in this area (an opinion expressed by KX leadership too) so new pricing models are actively being evaluated.

4) Our latest release of kdb+ 4.1 included a renewed focus on ease of installation and use, and a new documentation hub is being launched this year to further enhance the developer experience.

5) Our Community is growing rapidly – with now over 6000 members and 10 courses available in KX Academy. We have more and more developers networking to help others learn kdb+ every day with a month-over-month net new increase of members for the past 30 months. We’ve recently launched a Slack channel and developer advocacy program too.

There’s a lot of criticism about kdb+ (and KX) in this article, but a lot of the things devs love the most about kdb+ have been left out. This includes efficiency/compactness, expressiveness of q, vertical integration, and speedy development workflow. Sure, if you want to combine 3-5 tools to do what kdb+ does you can go that route, but we feel we offer a vastly superior experience with performance at scale. A quality that extends to ALL our products, including Delta & KDB.AI, since they are all built on kdb+.

Note: I reached out to the author to discuss, but he declined to talk to us. We posted a response on his blog too, but he never published the comment. It's been a pretty closed off situation for us, so leaving this here.

vamega · 2024-08-16T15:08:22 1723820902

For anyone else reading this, the author did post this comment on the blog, and added a reply there too.

nhourcard · 2024-08-03T10:15:16 1722680116

TLDR from the article;

Alternatives (which are open source) to KDB+ are split into two categories:

New Database Technologies (tick data store & ASOF JOIN): Clickhouse & QuestDB

Local Quant Analysis: Python – with DuckDB & Polars

Some personal thoughts:

Q is very expressive, and impressive performance can be extracted from kdb+, but the drawbacks are proprietary formats, vendor lock-in, costs, proprietary language and reliance on external consultants to make the system run adequately, which can increase operational costs.

I'm personally excited to see the open-source alternative stack emerging. Open Source time-series databases and tools like duckdb/polars for data science are a good combination. Storing everything in open formats like Parquet and leveraging high-performance frameworks like Arrow is probably where things are heading.

Seeing some disruption in this industry specifically is interesting; I think it will be beneficial, particularly for developers.

NB: disclosing that I'm from questdb to put thoughts in perspective

wys2wyg · 2024-08-03T02:23:03 1722651783

It is an old product that is no longer relevant, and there is no longer any demand for it. Time to move on.

dang · 2024-08-03T02:38:50 1722652730

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

helsinki · 2024-08-03T02:39:41 1722652781

Trillions of dollars in the financial system beg to differ.

thazework · 2024-08-03T12:04:41 1722686681

Saudi league I think