More

riordan · 2024-03-03T00:11:45.000000Z

Chronicling America is great, but as you’ve noticed it’s limited to issues that are in the public domain. It’s not just LC, the National Digital Newspaper Program[1] has been funding hundreds of libraries across all 50 USA states to make it possible since the mid-00s.

Australia’s national library offers Trove[2], which has a huge collection of Australian public domain newspapers.

Most of their repository is likely funded by your tax dollars and is there for the public to use.

[1]: https://www.neh.gov/divisions/preservation/national-digital-... [2]:https://trove.nla.gov.au/newspaper/

ajxs · 2024-03-03T02:24:14.000000Z

Trove is fantastic. It came in handy very recently for the same project I used the Library of Congress for[1]. The Museums of History New South Wales has a great online archive as well[2].

[1]: https://ajxs.me/blog/The_Identity_of_The_Sanctimonious_Kid.h...

[2]: https://mhnsw.au/collections/state-archives-collection/

riordan · 2024-01-08T16:24:49.000000Z

It’s good to hear that PWAs are still on Firefox Android, even though they’re out of Desktop.

From my [fairly-out-of-the-loop-for-the-past-few-years] vantage point, Mozilla’s been a lot less invested in the PWA ecosystem since they abandoned their Firefox Phone / Boot2Gecko initiative, which was intended to create a middle tier between the expensive smartphones of the early 2010’s and ubiquitous and cheap feature phones (flip phones, classic Nokia candy bars), and expand access to the web across the world with it.

All the apps were PWAs, which made it simple to build out. Eventually Mozilla stopped the project, but KaiOS became a commercial implementation and it still runs on a fair number of feature phones to this day.

But without that pressure for PWA support in Firefox as a critical mobile feature, it was largely serving as an expensive bookmark launcher in the Firefox code base so that folks could alt-tab to the small number of sites that supported it on their desktops. Not a noble end for Mozilla’s support for what should have been / could have been an incredible leverage point for the web ecosystem and open development.

riordan · 2023-12-28T15:07:56.000000Z

It’s not. If it holds up well in terms of operating it and extending it when things change, then it’s not “too simple”. You’ve just built on 4 extremely resilient and scalable technologies which have made the kind of “Big Data” hoop-jumping of the past 20+ years rather moot.

You know the drill: SQLite and DuckDB mean you’ve got transactional database and data warehouse that live in the single machine and unless it’s terabytes runs analytics performance comparable to bigquery and snowflake.

No need to write an etl pipeline that runs on a Hadoop cluster, provision that Hadoop cluster, spin up a Hive/Pig instance for analysis. Nah, it fits in *your* computer and is scripted using the same language in the pipelines as in the analysis without a performance cost.

If you need to scale it, it’s not technical scaling, it’s team knowledge scaling (still hard, but not a fundamental stumbling block). So bring in DBT/Dagster (or airflow) and now it’s got supported frameworks that others already know and use.

hilti · 2023-12-28T20:32:47.000000Z

Thank you for your valuable feedback - much appreciated!

Team knowledge scaling is hard - I totally agree and learned a lot of lessons.

Top management usually works „email only“. No matter what cool dashboard you’re building: they don’t use it, because they’re working almost exclusively on their phones.

And that‘s one tough challenge in my opinion: making data easy to understand on small screens.

Then there is this group of CFOs … they love to connect their Excel to a live datastream. Once. Because at some point they return to static sheets just to prove that a 35 MB Excel file shows their latest forecast.

riordan · 2023-12-23T19:55:27.000000Z

Likewise; this ticks so many boxes for me, but as a hippy-dippy non-gmail user I too have to second the IMAP/JMAP ask.

Gmail absolutely hits that sweet spot of API capabilities and where the users are, so I can’t fault the project creators (or most every email client business these days) for building first (or exclusively) for it.

That said, seeing Outlook as coming soon on their login page is reassuring that they’re building in a way that won’t tie them to Gmail forever. And while few email providers outside of Fastmail are offering JMAP support, as an API it’s much closer to the degree of functionality expected by anyone building on top of Gmail’s API today. A great new client that gives a big section of the public a better way to “do email” might be what it takes for more services to start offering JMAP.

So hats off to y’all and fingers crossed on incorporating open standards.

ElasticBottle · 2023-12-23T21:54:00.000000Z

Hey Riordan, curious to know specifically what you're looking for?

But right on about Gmail, it's much easier for us to prototype and test than some of the other clients.

Also really appreciate the kind words! Means a lot to us at this early stage. Hopefully you do join the email list or discord so we can keep you posted on our progress!

simfree · 2023-12-24T04:20:09.000000Z

Supporting standard SMTP and IMAP lets you access the whole universe of mail servers, from Yahoo to Zoho to MXRoute.

riordan · 2023-12-05T15:29:36.000000Z

This is genuinely awesome. The crossstreet based search is a perfect approach in NYC. Reminds me of how I used to use Google’s sms service to get directions from about 2004-2010.

riordan · 2023-11-17T21:26:49.000000Z

Sunday service has always been particularly important to the New York City public libraries. Andrew Carnegie’s original deal always was, he would find the construction of the branches the libraries would run the branches, and the city would fund the branches with seven-day service. For a while. the Carnegie branches open seven days a week, even as they had to follow through on cutbacks at the non-Carnegie branches.

But that ship sailed long ago. Very few were still able to offer Sunday service before this:

- NYPL (Manhattan/Bronx/Staten Island): 8/92 sites

- Brooklyn Public Library: 8/66 sites

- Queens Public Library: 2/66

(Yes, there are three separate public library systems for New York City. They pre-date the consolidation of the city and no matter how hard folks have tried every study on consolidating the three systems into a single organization, winds up costing significantly more than the current status quo.)

riordan · 2023-11-08T12:38:03.000000Z

This is a big step forward, and is also making me nostalgic for the Friend-of-a-Friend (FOAF)[0] blogrolls of the early 00’s. RDF-in-HTML standard that could express recommendations and relationships.

That said, I’m glad to see Webmention adoption. It’s got a much clearer purpose than FOAF (it might’ve been a little too expressive) and fits nicely into the current web ecosystem.

[0]: https://twobithistory.org/2020/01/05/foaf.html

riordan · on Aug 16, 2023

> In this context-- the section in article where it says present data is of virtually zero importance to analytics is no longer true. We need a real solution even if we apply those (presumably complex and costly) solutions to only the most deserving use cases (and not abuse them).

Totally agreed, though where real-time data is being put through an analytics lens is where CDW's start to creak and get costly. In my experience, these real-time uses shift the burden from being about human-decision-makers to automated decision-making and it becomes more a part of the product. And that's cool, but it gets costly, fast.

It also makes perfect sense to fake-it-til-you-make-it for real-time use cases on an existing Cloud Data Warehouse/dbt style _modern data stack_ if your data team's already using it for the rest of their data platform; after all they already know it and it's allowed that team to scale.

But a huge part of the challenge is that once you've made it, the alternative for a data-intensive use case is a bespoke microservice or a streaming pipeline, often in a language or on a platform that's foreign to the existing data team who's built the thing. If most of your code is dbt sql and airflow jobs, working with Kafka and streaming spark is pretty foreign (not to mention entirely outside of the observability infrastructure your team already has in place). Now we've got rewrites across languages/platforms, and leave teams with the cognitive overhead of multiple architectures & toolchains (and split focus). The alternative would be having a separate team to hand off real-time systems to and only that's if the company can afford to have that many engineers. Might as well just allocate that spend to your cloud budget and let the existing data team run up a crazy bill on Snowflake or BigQuery as long as it's less than the cost of a new engineering team.

------

There's something incredible about the ruthless efficiency of sql data platforms that allows data teams to scale the number of components/engineer. Once you have a Modern-Data-Stack system in place, the marginal cost of new pipelines or transformations is negligible (and they build atop one another). That platform-enabled compounding effect doesn't really occur with data-intensive microservices/streaming pipelines and means only the biggest business-critical applications (or skunk works shadow projects) will get the data-intensive applications[1] treatment, and business stakeholders will be hesitant to greenlight it.

I think Materialize is trying to build that Modern-Data-Stack type platform for real-time use cases: one that doesn't come with the cognitive cost of a completely separate architecture or the divide of completely separate teams and tools. If I already had a go-to system in place for streaming data that could be prototyped with the data warehouse, then shifted over to a streaming platform, the same teams could manage it and we'd actually get that cumulative compounding effect. Not to mention it becomes a lot easier to then justify using a real-time application the next time.

[1]: https://martin.kleppmann.com/2014/10/16/real-time-data-produ...

riordan · on Sept 16, 2020

Indeed they did.

I also fondly remember how wonderful the web development deployment workflow was in Coda. This was back before source control was ubiquitous and CI/CD was required for any real production environment. It was the cleanest way I'd seen to go from saving in your test editor and having a clear path to SFTP it to the server. I half recoil in horror and am still in awe of how considered their experience was.

pilsetnieks · on Sept 16, 2020

> This was back before source control was ubiquitous

Your memory may be playing tricks on you, it wasn't that long ago. Coda 1 had SVN support out of the box, and at the time SVN was already old and in decline, not to mention CVS and other lesser source control systems. By the time Coda 2 came out, git had won the DVCS battle, and was built into it.

Now, granted, back then I wasn't very keen on version control either but it's not because it wasn't there, it's because I was young and inexperienced, and didn't know any better. Plus, those days the only common use for source control was actually, well, source control, unlike today. It was something you had to discipline yourself to use for no immediate benefit.

smoe · on Sept 17, 2020

From my memory in the mid 2000 version control and automatized deploys weren't ubiquitous for the audience Coda mainly targeted: web site development.

I started my career at around that time in a company using Python for web application development and everything was in SVN but already in the progress of being moved to git and while there wasn't a CI/CD server, all deploys were done running a single cli command. On the other hand for years to come I have seen e.g. people and agencies doing websites in Wordpress making changes directly in production.

In that circles, which often included people with no heavy technical background like designers that learned some php/html/css Coda was quite a relevation I think.

joshcain · on Sept 16, 2020

100%. Coda was especially useful if you weren't running your server stack on your local dev machine. Hit "Publish" and all your changed files are pushed up automatically. So much better than hunting through an FTP client every time you make a small edit.

riordan · on Jan 7, 2020

Not gaming related but if you want to know the origins of where everything in computation came from, George Dyson’s “Turing’s Cathedral” is a revelation. It’s the story of computer, told by interviews of the folks who created the architecture of it all. It’s incredible history