Hacker News new | past | comments | ask | show | jobs | submit login
Datasette’s new JSON write API: The first alpha of Datasette 1.0 (simonwillison.net)
226 points by simonw on Dec 2, 2022 | hide | past | favorite | 18 comments



I'm really pleased with the Hacker News scraping demo in this - it's an extension of the scraper I wrote back in March, using shot-scraper to execute JavaScript in headless Chrome and write the resulting JSON back to a Git repo: https://simonwillison.net/2022/Mar/14/scraping-web-pages-sho...

My new demo also then pipes that data up to Datasette using curl -X POST - this script here: https://github.com/simonw/scrape-hacker-news-by-domain/blob/...


Just a small note on the shot-scraper post (very cool stuff!): You can use Array.from(xs, fn) instead of Array.from(xs).map(fn). Same semantics but it happens in one pass with less garbage.


I keep on forgetting that shortcut, thanks!


Very satisfied Datasette user here. There is no better way to immediately share structured data with a minimum of infrastructure. I have personally used it on tables with 20-50 million rows without issue.

As hinted in the opening, I am wondering if I could utilize the new API as a low-performance alternative to Redis. All of the SQLite guarantees, easy network access, and optionally backups through Litestream.


I'd never really thought about it as an alternative for Redis, but yeah if you can handle the huge performance decrease (which for many projects I imagine wouldn't matter in the slightest) that could maybe work pretty well!


Heh, well I am a solo developer with very low performance requirements so fewer moving pieces is the name of the game. Redis Just Works, but is yet another thing to manage. I know how to query SQL – I interface with Redis so infrequently I have to lookup the commands every time.

If/when the “library” feature gets incorporated, I could even imagine something crazy like using a library of SQLite databases to handle different backing services. ‘Redis’ cache server, job scheduler, log endpoint. Madness, but these are the idle thoughts before the caffeine has fully circulated.


That is really cool, and I'll definitely be using it!

Since I found out about sqlite-utils being able to trivially put JSON into a sqlite database without needing to stop and write out a whole schema, I've used it a few times. In the process I did think to myself, "why isn't there anything that just lets me dump JSON into an endpoint?". I didn't get around to doing it, and I actually thought maybe it wasn't such a great idea if it hadn't been done yet, because it seemed so obvious. Glad to see I'm not crazy :)

The flexibility and ease of getting different output formats (including charts!) from Datasette is something I hadn't considered, and it looks amazing!


Something I've been contemplating is if I should add a mode that lets you make unauthenticated write API calls from localhost - like Redis and MongoDB do.

It would need to be an explicit opt-in thing because I've seen what happens if you release software that is open by default, and it's not good!


Or a bit more generically, having a configurable allowlist of hosts, so I can make my ESP32 just a dumb "sensors on a microcontroller" thing pushing data to a RPi on my local home network, which has static IPs provided to certain devices via DHCP. Having to add an auth header isn't a great deal of work, but it'd be handy to be able to try things out quickly. IMO the great thing about all this is how frictionless it is.


I like that: I can have a thing where you can provide "localhost" but can also provide other host names and IPs too.


That approach also makes things more docker friendly.


Unfortunately the host header is spoofable


You mean the HTTP host header?

Yeah that's a good call: I need to do these checks only based on IP.

I could allow users to enter host names which I would resolve to the IP before checking though, which matches how networking is used in things like Kubernetes.


I like the idea of having some way to do unauthenticated calls, but would it be possible to layer on some additional granularity or would that be asking too much when the security infrastructure already exists?

That is, I might want to be able to append data without any ceremony (logs?), but schema changes or data mutation would require credentials.


The big feature in the next alpha will be the ability to create tokens that have a subset of permissions - eg can only write to specific tables: https://github.com/simonw/datasette/issues/1855

But maybe a mechanism for easily specifying that for anonymous localhost users would be useful too?


Just dropping in (long time HN lurker, first time HN poster) to say thanks for such a great capability in datasette and datasette-lite. I use sqlite_utils in numerous solutions to insert/upsert various json data into sqlite databases for various purposes. I use datasette/datasette-lite to host data products for consumers who need rapid insight but are not "sql jockeys."

I think this mutation capability would be great for some of my data logging workflows. I push json formatted telemetry data from different types of devices/sensors over tailscale to a central collector. Combined with clickhouse's adapter to sqlite databases, this could make for a very nice self-contained data reporting framework where all the source data flows through datasette to a set of sqlite files organized by topic. It will be important to me in deploying new event senders to have straightforward authentication I can bake pre-authenticated connection schemes into standard images that are installed without customization. Alternatively, configuring datasette so a specific set of ip addresses / cidrs is just granted insert to specific dbs/tables would work fine.

Thanks again. Great stuff!


I am a little lazy so only skimmed the article: is there UPSERT support?


Not yet, but I hope to ship that in the next alpha in a few days time: https://github.com/simonw/datasette/issues/1878




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: