More

samsk · 2024-09-22T23:12:45 1727046765

BF is slightly hard to read, more like a well written Perl.

IMHO any solid enteprise should use Ook! or similar substitution - power of Perl, with verbosity of COBOL !

https://esolangs.org/wiki/Ook!

shawn_w · 2024-09-23T05:13:02 1727068382

You can embed BF directly in your perl code!

https://metacpan.org/pod/Acme::Brainfuck

There's an Ook! interpreter too but it's more limited.

https://metacpan.org/pod/Language::Ook

wiz21c · 2024-09-23T11:57:49 1727092669

If only they'd used the f-word instead of Ook...

ArnoVW · 2024-09-23T12:25:08 1727094308

I believe it’s a reference to a Discworld personage

The Librarian is known for his violent reaction whenever he hears anyone refer to him as a "monkey" (orang-utans are apes). He speaks an elaborate language whose vocabulary consists of the single word Ook (and its antonym "eek" - where "ook" means yes, "eek" tends to mean no)

https://discworld.fandom.com/wiki/The_Librarian

wiz21c · 2024-09-23T14:59:27 1727103567

waho, didn't know ! Thanks for explaining.

wiz21c · 2024-09-23T11:55:19 1727092519

Seriously, why ? :-)

samsk · 2024-08-22T05:27:58 1724304478

I'll be afraid when AI takes a bug report with that typical one sentence problem description (like 'Paypal payments are not being processed, we loose 30k a day'), and fixes the problematic one line of code that the AI wrote before.

samsk · 2024-08-06T19:43:12 1722973392

For internet user facing full-text search I would always prefer to use a separate tool and not a SQL database, because

- the fulltext tool, can and should hold only 'active' data

- as it has only active data, data size is usually much much smaller

- as data size is smaller, it better fits in RAM

- as data size is smaller, it can be probably run on poorer HW the full ACID db

- as the indexed data are mostly read-only, the VM where it runs can be relatively easily cloned (never seen a corruption till now)

- as FTS tools are usually schema-less, there is no outage during schema changes (compared to doing changes in ACID db)

- as the indexed data are mostly read-only, the can be easily backup-ed

- as the backups are smaller, restoring a backup can be very fast

- and there is no such thing as database upgrade outage, you just spin a new version, feed it with new data and than change the backends

- functionality and extensibility

There is probably more, but if one doesn't needs to do a fulltext search on whole database (and you usually don't), than its IMHO better to use separate tool, that doesn't comes with all the ACID constraints. Probably only downside is that you need to format data for the FTS and index them, but if you want run a serious full-text search, you will have to take almost the same steps in the database.

On a 15y old side project, I use SOLR for full-text search, serving 20-30k/request per day on a cheap VM, and PostgreSQL is used as primary data source. The PostgreSQL has had several longer outages - during major upgrades, because of disk corruption, because of failed schema migrations, because of 'problems' between the chair and keyboard etc... During that outages the full-text search always worked - it didn't had most recent data, but most users probably never noticed.

whalesalad · 2024-08-06T20:07:17 1722974837

playing devils avocate

> the fulltext tool, can and should hold only 'active' data

very possible with postgres, too. instead of augmenting your primary table to support search, you would have a secondary/ephemeral table serving search duties

> as data size is smaller, it better fits in RAM

likewise, a standalone table for search helps here, containing only the relevant fields and attributes. this can be further optimized by using partial indexes.

> as FTS tools are usually schema-less, there is no outage during schema changes (compared to doing changes in ACID db)

postgresql can be used in this manner by using json/jsonb fields. instead of defining every field, just define one field and drop whatever you want in it.

> as the indexed data are mostly read-only, the can be easily backup-ed

same for postgres. the search table can be exported very easily as parquet, csv, etc.

> as the backups are smaller, restoring a backup can be very fast

tbh regardless of underlying mechanism, if your search index is based on upstream data it is likely easier to just rebuild it versus restoring a backup of throwaway data.

> The PostgreSQL has had several longer outages - during major upgrades, because of disk corruption, because of failed schema migrations, because of 'problems' between the chair and keyboard etc...

to be fair, these same issues can happen with elasticsearch or any other tool.

how big was your data in solr?

samsk · 2024-08-06T21:39:38 1722980378

PostgreSQL is over 90GB, SOLR is under 1GB.

The PostgreSQL has to handle writes, reports, etc..., so I doubt it will cache as efficiently as full-text engine, you'll need to have full or partial replicas to distribute the load.

And, yes, I agree, almost all of this can be done with separate search table(s), but this table(s) will still live in a 'crowded house', so again replicas will be probably necessary at some point.

And using replicas brings new set of problems and costs ;-)

One client used MySQL for fulltext search, it was a single beefy RDS server, costing well over $1k per month and the costs kept raising. It was replaced with a single ~$100 EC2 machine running Meilisearch.

conradfr · 2024-08-07T07:34:24 1723016064

Meilisearch is good, probably sufficient for a vast majority of use cases.

In fact self hosting it is so easy and performant that I wonder how their paid saas is doing.

sidmitra · 2024-08-06T20:11:11 1722975071

While agreeing somewhat with the post above, the answer isn't really so black and white but depends on your context, i.e. scale, app-complexity, search needs, data size etc.

>the fulltext tool, can and should hold only 'active' data

Same can be said about your DB. You can create separate tables, partitions to hold only active data. I assume materialized views are also there(but never used them for FTS). You can even choose to create a separate postgres instance but only use it for FTS data. The reason to do that might be to avoid coupling your business logic to another ORM/DSL and having your team t learn another query language and its gotchas.

> as data size is smaller, it better fits in RAM

> as the indexed data are mostly read-only, the VM where it runs can be relatively easily cloned

> as the indexed data are mostly read-only, the can be easily backup-ed

> as the backups are smaller, restoring a backup can be very fast

Once the pg tables are separate and relevant indexing, i assume PG can also keep most data in memory. There isn't anything stopping you from using a different instance of PG for FTS if needed.

> as FTS tools are usually schema-less, there is no outage during schema changes

True. But in practice for example ES does have schema(mappings, columns, indexes), and will have you re-index your rows/data in some cases rebuild your index entirely to be safe. There are field types and your querying will depend on the field types you choose. i remember even SOLR did, because i had to figure out Geospatial field types to do those queries, but haven't used it in a decade so can't say how things stand now.

https://www.elastic.co/guide/en/elasticsearch/reference/curr...

While the OPs point stands, in a sufficiently complex FTS search project you'll need all of the features and you'll have to deal with the following on search oriented DBs

- Schema migrations or some async jobs to re-index data. Infact it was worse than postgres because atleast in RDBMS migrations are well understood. In ES devs would change field types and expect everything to work without realizing only the new data was getting it. So we had to re-index entire indexes sometimes to get around this for each change in schema.

- At scale you'll have to tap into WAL logs via CDC/Debezium to ensure your data in your search index is up-to-date and no rows were missed. Which means dealing with robust queues/pub-sub.

- A whole another ORM or DSL for elasticsearch. If you don't use these, your queries will soon start to become a mish-mash of string concats or f-strings which is even worse for maintainability.

https://elasticsearch-py.readthedocs.io/en/v8.14.0/ https://elasticsearch-dsl.readthedocs.io/en/latest/

- Unless your search server is directly serving browser traffic, you'll add additional latency traversing hops. In some cases meilisearch, typesense might work here.

I usually recommend engineers(starting out on a new search product feature) to start with FTS on postgres and jump to another search DB as and when needed. FTS support has improved greatly on python frameworks like Django. I've made the other choice of jumping too soon to a separate search DB and come to regret it because it needed me to either build abstractions on top or use DSL sdk, then ensure the data in both is "synced" up and maintain observability/telemetry on this new DB and so on. The time/effort investment was not linear is and the ROI wasn't in the same range for the use-case i was working on.

I actually got more mileage out of search by just dumping small CSV datasets into S3 and downloading them in the browser and doing FTS client side via JS libs. This basically got me zero latency search, albeit for small enough per-user datasets.

samsk · 2024-08-06T22:00:45 1722981645

Yes, it always depends on application and purpose.

But once you will have to deal with a real FTS load, as you say, you have to use separate instances and replication, use materialized views etc.. and you find your self almost halfway to implementing ETL pipeline and because of replicas, with more complicated setup than having a FTS tool. And than somebody finds out what vector search is, and ask you if there is an PG extension for it (yes it is).

So IMHO with FTS in database, you'll probably have to deal with the almost same problems as with external FTS (materialized views, triggers, reindexing, replication, migrations) but without all its features, and with constrains of ACID database (locks, transactions, writes)...

Btw. I've SOLR right behind the OpenResty, so no hops. With database there would be one more hop and bunch of SQL queries, because it doesn't speaks HTTP (although I'm sure there is an PG extension for that ;-)

samsk · 2024-08-06T08:18:25 1722932305

SEEKING WORK | Remote | Worldwide (EU-based, GMT+1) | Part-Time/Full-Time

Backend developer, SQL expert, Technical lead, Architecture consultant (CTO for hire), DevOps engineer with 15+ years of experience on multiple projects and multiple roles..

Language: English, German

Areas of expertise:

* Fulltext/geospatial search systems with SOLR/ElasticSearch/...

* Cloud/On-premise asset inventory (CI creation) for ITSM systems (with https://github.com/cloudinventario/cloudinventario)

* HA Prometheus monitoring systems integration with ITSM systems (preferably iTop), with monitoring configuration over ITSM, CI discovery, autoconfiguration...

* ETL pipelines (data collection, transformation, enrichment, BI reporting...), DWH Building

* API Gateways (Kong, Apisix, OpenResty...)

* Automation with Ansible / Terraform

* SQL data model and query optimization

* Open source systems integration, glueing together MVP using various open source components

* Skills: Perl, C, Bash, Python, Lua, Go, Java, NodeJS, PHP...

Github: https://github.com/samsk/

Email: ja.hn4[at]mailgw.com

samsk · 2024-07-06T21:08:07 1720300087

Web 3.0 failed, lets pour the money in the new black hole ;)

Ankaios · 2024-07-06T21:16:24 1720300584

I guess it's time for someone to start selling token tokens.

samsk · 2024-07-05T15:18:18 1720192698

I've worked for several betting companies, and I don't think it hurt me anyhow. And once you have experience with the industry, other betting companies are more likely to pick you when hiring. The same is true for banking (but probably also other industries).

samsk · 2024-05-02T11:25:36 1714649136

SEEKING WORK | Remote | Worldwide (EU-based) | Part-Time/Full-Time

Backend developer, SQL expert, Technical lead, Architecture consultant (CTO for hire), DevOps engineer with 10+ years of experience on multiple projects.

Language: English, German

Areas of expertise:

* SQL data model and query optimization

* Fulltext/geospatial search systems, with autocomplete, REST/GraphQL APIs on SOLR or Elasticsearch

* Cloud/On-premise asset inventory (CI creation) for ITSM systems (with https://github.com/cloudinventario/cloudinventario)

* HA Prometheus monitoring systems integration with ITSM systems (preferably iTop), with monitoring configuration over ITSM, CI discovery, autoconfiguration...

* ETL pipelines (data collection, transformation, enrichment, BI reporting...), DWH Building

* API Gateways

* Automation with Ansible / Terraform

* Open source systems integration, glueing together MVP using various open source components

* Perl expert ;-)

Github: https://github.com/samsk/

Email: ja.hn4[at]mailgw.com

samsk · 2024-04-15T09:10:46 1713172246

Maybe I'm old, but do we really watch the movies because they are sharper, have vivid colors, etc or because of the story ?

On the other hand, I would probably pay for an AI that will 'undark' that new super artistic movies, because some of them, have worse lighting than the Blair With Project...

grujicd · 2024-04-15T09:27:13 1713173233

Super dark scenes might work on OLED screen, but every projector I saw including those in theatres can't display real black and darkest shades are always a problem. It's not a problem if you have bright parts in the same scene since eye will adapt. But everything is dark it won't look good in cinema. That affected how movies were shot, lighted, which kind of scenes were filmed at all. I wonder if some of those newer movies are intended to look better on modern TVs than in movie theather?

bamboozled · 2024-04-15T09:33:28 1713173608

You're not old ,people just have a hammer and are looking for a nail.

We'll see so much of this in the next few years, optimizing everything to the point of boring. Perfect pop songs, perfect movies, perfect novels and on it goes.

mrguyorama · 2024-04-15T15:58:40 1713196720

How many people watched Dune 1 and 2 because of a story that has been around for decades and already had one interesting film interpretation? How many people watched Avatar for the story?

The existence of IMAX should be a hint that there is value in very crisp visuals.

However, this stupid tiktok filter is NOT "very crisp".

acuozzo · 2024-04-15T15:21:01 1713194461

> Maybe I'm old, but do we really watch the movies because they are sharper, have vivid colors, etc or because of the story ?

Where would, e.g., Koyaanisqatsi fit then?

tiptup300 · 2024-04-15T12:14:08 1713183248

a lot of the people who purchase a lot of films buy them because they are sharper, have more vivid colors.

Who knows if those whales are even watching most of their own collection.

samsk · 2024-04-01T15:03:25 1711983805

This! You come to a fintech to review 'some' performance issues and find a stupid data model and transaction processing completely ignoring how databases work and what limitations that concrete database has ! But hey, look, we used this shiny tech and that newest stack, where is the problem ???

samsk · 2024-03-17T17:03:52 1710695032

Ouch, just remembered I've missed my regular yearly car wash. Don't tell I should do such a unnecessary and time consuming thing more often.