Hacker News new | past | comments | ask | show | jobs | submit login
Harlequin: DuckDB IDE for the terminal (harlequin.sh)
309 points by billowycoat on Sept 20, 2023 | hide | past | favorite | 58 comments



Hi everyone! I made this. Tried posting it to Show yesterday, glad this thread is getting more momentum!

For the past four months I've been working (part-time, this is OSS after all) on Harlequin, a SQL IDE for DuckDB that runs in your terminal. I built this because I work in Data, and I found myself often reaching for the DuckDB CLI to quickly query CSV or Parquet data, but then hitting a wall when using the DuckDB CLI as my queries got more complex and my result sets got larger.

Harlequin is a drop-in replacement for the DuckDB CLI that runs in any terminal (even over SSH), but adds a browsable data catalog, full-powered text editor (with multiple buffer support), and a scrollable results viewer that can display thousands of records.

Harlequin is written in Python, using the Textual framework. It's licensed under MIT.

Yesterday I released v1.0.0: you can try it out with `pip install harlequin`, or visit https://harlequin.sh for docs and other info.


Lovely tool. I'll certainly try it out. The code fragments in the documentation seem to be not selectable, though (or maybe it is not highlighting correctly).


it's an issue with the highlight color. I'll get that fixed shortly.


This looks very cool! how is autocomplete feature implemented?


Right now there is no autocomplete inside the text editor, but that will be added soon. DuckDB itself provides an extension that adds a table function that can be queried for completions like so:

select * from sql_auto_complete( 'select ra' )


I'm not in love with this style of UI design in terminals:

https://harlequin.sh/_app/immutable/assets/export.a0e81d27.p...

Every item in the form takes 4 lines (I think?) whereas in more traditional curses UI they would be packed to one line per item, the scrollbar could have easily been avoided here. Smaller nitpick but that style of toggle switches is also form over function; I'd find traditional [X] far more clear/less ambiguous


+1. Lazygit is one of the few tools that does the TUI just right. Super functional UI + Easy discoverability = Chef's kiss.


TIL about LazyGit. Looks excellent. Thanks for the tip, and the UI feedback


Don’t forget lazydocker if you use docker


Agree. The best curses interfaces make liberal use of highlighting and color, rather than box drawing characters, which take up too much real estate. Besides lazygit, which rocks, I would even point to the old DOS versions of MS Word, which had popups that were information dense but readable.


Color can be tricky too. Colorblind or bad contrast. Terminals do bold, italic, underline and strikethrough.


fair feedback!


I love this and will definitely try it out! Although I admit I'm a little puzzled when people simultaneously want to do a TUI but also design things so there is generous (excessive, actually) whitespace around everything to create the illusion of "minimalism" or "comfort".

It's a TUI! It should be buzzing with numbers, packed with information, sparing with space and using every pixel possible. btop[1] is a great example imo — one of the best.

[1]: https://github.com/aristocratos/btop


Thanks for the feedback; gotten some of the same here.

I've tried to be really thoughtful about using space and getting value out of every pixel. There are some features in Harlequin that make it nice on small screens, even with the padding:

1. Scrolling is supported in every widget 2. You can show or hide the left side bar with ctrl+b or F9 3. You can enter "full screen mode" in the query editor or results viewer with F10

Beyond the main screen, the interface could be more compact. Especially the export and help modals. I'm considering tweaks to the design.


> It should be buzzing with numbers, packed with information, sparing with space and using every pixel possible.

No. It should be useful, and if it’s not “buzzing with numbers” and causing me a seizure, well, thank fucking God.

If you want something like that, go write it yourself. I, for one, find the folks writing quiet terminal interfaces to be a massive relief and a tremendous respite from the madness of our seemingly increasingly incoherent, incompetent, and distracting world.


If anyone here is using DuckDB in production i'd love to hear what your stack looks like over the entire lifecycle of extract->transform->load.


We orchestrate our ETL pipelines with dagster. We only use duckdb in a few of them but are slowly replacing pandas etls with it. For some of our bigger jobs we use spark instead.

Essentially it's: 1. Data sources from places such as s3, sftp, rds 2. Use duckdb to load most of these with only extensions (I dont believe there's one for sftp, so we just have some python code to pull the files out.) 3. transform the data however we'd like with duckdb. 4. convert the duckdb table to pyarrow 5. Save to s3 with delta-rs

FWIW, we also have this all execute externally from our orchestration on an EC2 instance. This allows us to scale vertically.


This is very cool!.

Last time I checked duckdb didn't have the concept of a metastore so do you have an internal convention for table locations and folder structure ?.

What do you use for reports/visualizations? notebooks ?.


Yeah, dagster has a concept of metadata and assets so we have some code that'll map dagster's own logical representation to physical s3 locations.

Reports and viz varies a lot, the finance department uses tableau where as for more 'data sciencey' stuff we normally just use notebooks.


Hadn't heard of that dagster feature, thats really cool.


It's great as: 1. An ephemeral processing engine. For example, I have a machine learning pipeline where I load data into a DataFrame, and then I can use DuckDB to execute SQL on my DataFrame (I prefer both the syntax and performance to Pandas). 2. A data lake processing engine. DuckDB makes it very easy to interact with partitioned files. 3. A lightweight datastore. I have one ETL pipeline where I need to cache the data if an API is unavailable. I just write the DataFrame to a DuckDB database that is on a mounted network filesystem, and read it back when I need it.


On a similar point, are people using the actual duck database format or sticking with Parquet? I love everything about DuckDB, but I feel more comfortable keeping things in an existing format.

My only work with it to date has been to load-analyze-usuallydelete to refresh, so I do not require any db mutability. Outside of mutability, not sure if there are any obvious wins with the format.


It's a bit faster and easier than parquet, but right now the format is unstable, which is a huge downside and makes it unsuitable for medium/long-term storage. After DuckDB v1, they'll keep the format stable and then I think its popularity will increase dramatically.


I'm using it for site-search analytics. Originally I was using MariaDB but because we do search-as-you-type there's a complex aggregation query to identify the "real" searches, and with 1000 entries that was taking 10s of seconds to run. Materialized views would've fixed that but...

My stack is now JSON files containing the search data. At present I'm loading these into DuckDB and pre-aggregating due to a bug I found in DuckDB, not because it was needed for performance.


I've been using it for taking output from our data validation steps and bundling that up with the data that was validated into one neat artifact we can download if there is an issue and explore manually.


duckDB is one of the built-ins for count.co, which i've come to love


We use it to sort parquets out of core and then the arrow interface to read into Python and export as geoparquets


This looks super handy, I will definitely take a moment to try it out.

For those asking what DuckDB is: columnstore databases like DuckDB may be slower at data ingestion, but are very quick at multi-GB sums, counts, and aggregations.


This looks great! I’ve been using Motherduck for a while now, glad to see more things being built with it in mind.


I need this but for C++, I kinda miss the old borland turbo C++ ide. mostly what I care about is terminal access and code navigation but other than some vim + plugins monstrosity I cant find any.


Same. Thousands text editors, but not a single IDE.

And they all get SO close, just give a look to SetEdit screenshots: https://setedit.sourceforge.net/

I don't know if it can be of any help for you, but at the moment on the terminal I mostly use:

- Tilde: https://github.com/gphalkes/tilde

- Textadept: https://github.com/orbitalquark/textadept

Or "Geany IDE" on desktop environment (while waiting for lapce.dev to get better), I tend to stay away as much as possible from VS Codium, but everyone else seems to love it and already forgot about Atom, few seems to realise how Microsoft really is.

Maybe the plot twist is that you have to accept in your heart that "writing text on anything, is the real IDE", and transcend to writing on nano!


A rich text-based UI feels like it has all the downsides of a GUI while substantially lacking the upsides.

For example, Tilde. It seems nice and maybe it has nice features, but what about these features is so highly dependent on being accessible through a "menu"? Do you reach for the "menu" all the time? Or are there fancy "dialogs"?

This feels like Vim, but with some strange plugin that provides a "menu". I'd think why not just jump ahead and ditch the quasi-GUI and just learn some shortcuts?

Honestly not trying to be difficult here. I want to understand the mindset, because you are definitely not alone.


There’s Motor, but I don’t know if the project is still being maintained.

https://github.com/rofl0r/motor


Very complete IDE - would be very useful for other RDBMSes.


Thanks! Hopefully coming to v2 ... https://github.com/tconbeer/harlequin/issues/230


Absolutely love the logo. I'd like to see more projects/startups choosing logos/themes with this level of personality.

Project looks rad too, but I'm just here to appreciatively bike shed.


I don't mean to minimize the effort and contribution of this project in any way, and I understand the author is scratching their own itch, but I'm curious why this is specific to DuckDB? From skimming the homepage, it has generic querying capabilities. Isn't there a generic DB driver that can be swapped out and the TUI used on top of another DBMS?


DuckDB provides some really nice python APIs that other databases don't have, so it was a very easy way for me to get started. I'm considering/planning extending to other DBs using plugins. More info here: https://github.com/tconbeer/harlequin/issues/230


I quite like the dbcli.com family of TUI DBMS tools. I personally use pgcli a lot.


Apache Arrow recently introduced ADBC for this.


Perhaps slightly off topic, but I'm not familiar with DuckDB. I don't really understand what it is for, and their site is doesn't really explain it well either.

What is the advantage of using this instead of something well-proven and battle tested like an in memory SQLite database?


DuckDB is a columnar db and focused on analysis queries. So it is more optimized for those workflows. Of course for many things sqlite is fine but if you have a really large db with complex queries duckdb should be faster.


DuckDB also compresses the data. I've ingested ≈ 1 TB of apache-like access logs into a duckdb file that was small enough to load completely into ram on my local machine (like 10-15 GB). It was blazingly fast to query over.

(The large spark cluster was still ingesting the data when I was done with my analysis, only working on my local laptop)


Love the passion and enthusiasm behind this project. Keep it up!


Thank you!


Kinda off topic but how do I install these pip sourced command line applications such that they are available regardless of which virtual environment I am using?


As mentioned in Getting Started, with `pipx`:

`pipx install harlequin`

From the pipx documentation:

> This automatically creates a virtual environment, installs the package, and adds the package's associated applications (entry points) to a location on your PATH.

It makes the command available globally, but sandboxes the package in its own virtual environment.

https://github.com/pypa/pipx


what are the benefits of DuckDB?


Databases can usually be split into one of two types; OLTP (row-based) or OLAP (columnar). OLTPs are used mostly for transactional workloads whereas OLAP is mostly used for analytics.

Here goes...

Take 1: sqlite is to Postgres, what duckdb is to Snowflake/BigQuery.

Take 2: In a similar way that sqlite is an in-process/memory OLTP, duckdb is an in-process/memory OLAP.

I should mention some caveats/exceptions/notes about my statements above:

- there are OLAP projects out there that use Postgres as their basis.

- HTAPs are DBs that allow you to define tables as either row-based or columnar.

- duckdb works with sqlite tables, and its SQL is heavily based on postgres SQL

- duckdb 0.9.0 is being released next week :)

- It seems duckdb is poised to become an almost ubiquitous component of the analytical stack. Apache Iceberg, dbt, ...


it’s like a columnar store sqlite, better performance for analytical data


It’s similar to sqlite but can open multiple file formats.


In addition to the other examples given, it's a quick way to run sql commmands against a CSV, and lets you quickly export to Parquet as well run queries against Parquet files. If you're going to be using Parquet but want to run some quick tests (think setting up an Athena database) it's nice to do it ad hoc rather than having to setup a full ETL process first.


I like it for writing easy sql locally transforming data frames without having to think too hard about it.


Love the SVG's for showing of themes.

How are those generated?


Textual provides a utility for screenshotting apps. I loop over a list of themes and take the screenshots. You can see the code for that here: https://github.com/tconbeer/harlequin/blob/e666319b7a964f026...


Thanks!


Love the simplicity yet functionality.


Is there something like this for XML?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: