More

adeptima · 2025-11-06T03:04:57 1762398297

10+ years in Japan. The message here is much deeper from my perspective. “Let’s jump on the call” is not the solution. The guy was stripped off of his face. I love Japan for being human. Small business bar or restaurant with 3 tables. Not everything should be streamlined for a quick call solution… the process was pushed on his head. Google nemawashi decision making process

alwa · 2025-11-06T03:17:41 1762399061

I did as you suggested with respect to “nemawashi.” I read about that and “ringi,” and I’m glad I did. Even to get just the gist of what I’m sure is a thin interpretation: that nemawashi refers to a “laying-the-groundwork” process of circulating a proposal between peer-level counterparts, before formalizing it and proposing to act on it.

Much less crashing in with it in the form of a “SumoBot,” as Mozilla seems to have done to its non-English communities… (with the disclaimer that I have zero insight into Mozilla’s process here outside of this writer’s account).

It puts a name to a considerate consensus-based way to approach change, that seems humane (and effective) in any culture—leave it to the Japanese to have a specific term for it…

martin_henk · 2025-11-06T03:20:59 1762399259

common sense... no real need for digging into japanese culture and so on. really no idea why Mozilla is so disrespectful to it's volunteers. well, that sweet 400m a year from Google... no need for volunteers anymore, eh

alwa · 2025-11-06T03:26:37 1762399597

For sure. Common sense <> common, etc… although it does seem relevant that it was specifically a Japanese-language sub-community who were reacting here.

I have to say it feels like a really familiar, NGO-flavored disrespect, though: “we’re doing this favor for underrepresented language communities,” regardless of whether they want/need it or not.

“There’s only X number of you having to shoulder the load in XX sub-community, don’t you want us to impose a bunch of ‘help’?”

Well, no, if the choice is between a formidable volume of slop and a smaller but well-executed volume of volunteer labor-of-love…

(…I say as a person very much without all sides of the story, and shooting from the hip a bit. I don’t mean to impugn anybody’s intentions, and I imagine at the end of the day we’re all on the same side here.)

shaky-carrousel · 2025-11-06T08:04:27 1762416267

Enlightened despotism, all over again.

toyg · 2025-11-06T10:48:11 1762426091

mixed with good ol' white-man-saviour attitudes.

TheJoeMan · 2025-11-06T03:44:32 1762400672

That reminds me of internet RFC’s… like by the time they are formally published, no the author is not interested in your “comment”.

Arnt · 2025-11-06T03:52:06 1762401126

I've written a few RFCs.

For any RFC, there will be a "comment" after publication from someone who did not take earlier comments seriously enough to read them.

eschatology · 2025-11-06T06:13:18 1762409598

Exactly the attitude described by GP comment

Mind boggling

Arnt · 2025-11-07T04:49:23 1762490963

You may be relieved to hear that there's a straightforward process to have an RFC revised. Step 1 of that process, however, is reading the the RFC and the archived email about the RFC.

You can't just arrive after publication, ignore what others said before you, and expect anyone to listen to you.

alwa · 2025-11-06T04:58:21 1762405101

…and, for that matter, there was an earlier draft phase where the author was R’ing For your C. And you could have jumped in then and been more-or-less welcome.

hunter2_ · 2025-11-06T05:37:30 1762407450

Sounds like RFC ought to be the name of that draft phase, rather than a name encompassing all phases, especially not the final phase in which C's are no longer R'd.

dsr_ · 2025-11-06T16:01:19 1762444879

Historical precedent. They assigned a grad student to write up the notes; he wasn't sure he had got everything, so he titled it an RFC.

At this point, as we close in on 10,000 final-stage documents, it's better to pretend that "RFC" is just a name, not an acronym.

eesmith · 2025-11-06T06:27:25 1762410445

Times changed. Historical names did not.

"many of the early RFCs were actual Requests for Comments and were titled as such to avoid sounding too declarative and to encourage discussion.[8][9] The RFC leaves questions open and is written in a less formal style. This less formal style is now typical of Internet Draft documents, the precursor step before being approved as an RFC." https://en.wikipedia.org/wiki/Request_for_Comments

humanrebar · 2025-11-06T10:35:58 1762425358

RFCs can be titled Architecture Decision Records (ADRs) or policies once they are accepted.

xaedes · 2025-11-06T17:25:50 1762449950

> It puts a name to a considerate consensus-based way to approach change

When reading about nemawashi I immediately thought about its usage in software refactoring.

This is something you often intuitively do when making bigger refactors. Lay the foundations before actually doing it. Affected code parts and stakeholders should not be surprised by one big change. Instead they should be consulted before hand, building consensus, modify the planned big refactor itself and preparing the individual parts for it by small changes. Otherwise you will encounter a lot of friction, introduce bugs, etc.

It is very nice to have a proper term for this.

pengaru · 2025-11-06T04:34:35 1762403675

We Americans call this garnering buy-in.

antonymoose · 2025-11-06T17:46:23 1762451183

I’ve more typically heard it as “consensus / coalition building” - but in any case it’s such a sane way to work. No one wants a rude surprise, so why make one for your teammates.

p0w3n3d · 2025-11-06T08:39:37 1762418377

I predict that these times of excessive trust in AI during decision making will be written in history books at some point of time. Providing that there will be books at all.

I already suspect that Duolingo destroyed real people's recording of Spanish conversations and replaced them with AI. For example I can quite often hear continental Spanish accent which has never been taught to me before (as I started with Duolingo as a freshman) - it used to be always American Spanish accent. Wrongly cut conversions is another matter.

rester324 · 2025-11-06T06:24:10 1762410250

I am not sure I am buying this. There is nothing human about japanese business procedures. Most japanese business procedures usually only serve micro managing purposes, and the nemawashi procedure is basically stripping people who were not consulted before, from giving their honest input and impact in the decision making. In my opinion it creates more problems than it solves

ekianjo · 2025-11-06T03:46:51 1762400811

> nemawashi

Long time in Japan too, I would not consider newamashi as being Japan's strengths.

krick · 2025-11-06T04:08:34 1762402114

I can imagine what you mean, but since I am not in Japan, it would be interesting why you feel that way.

rtpg · 2025-11-06T04:33:51 1762403631

long and slow consensus building that weighs existing stakeholder's opinions heavily vs doing "the right thing" from the outset. So you move slowly and end up having very annoying conversations and compromises instead of just pushing something through. And the formal process is just a formality anyways, so then anyone not in the informal chatter just gets to experience the capriciousness anyways

The sort of consensus building ultimately involves having to do stuff to make people's opinions feel taken care of, even if their concerns are outright wrong. And you end up having to make some awkward deals.

Like with all this "Japanese business culture" stuff though, I feel like it's pretty universal in some degrees or another everywhere. Who's out there just doing things without getting _any_ form of backchannel checking first? Who wants to be surprised at random announcements from people you're working with? Apart from Musk types.

But of course some people are very comfortable just ripping the band aid off and putting people in awkward spots, because "of course" they have the right opinion and plan already.

Why context matters in judging whether some practice is good or not.

moonlet · 2025-11-06T06:14:46 1762409686

Who cares if they’re wrong? The point is respect for their opinions and feelings since you’ll have to work with them for twenty years. If you respect them, you get to do what you want to do and they won’t fuck with you or shoot down your proposal.

To be clear this is Japan we’re talking about with the twenty years part. The same thing applies in the US but on smaller timescales though. If people feel appreciated and respected and you have good relationships, they will basically back whatever you want.

rtpg · 2025-11-06T07:23:24 1762413804

To be clear I'm describing a point of view, but not always ascribing to it.

I tend to lean towards thinking backchanneling makes sense as a general vibe, if only because it's a way of doing things that lets people have dignity, and the costs _can be_ low.

rester324 · 2025-11-06T06:29:50 1762410590

I think this is a very naive take. Japanese people will blame you for any failure regardless if you respect them or not. And many times failures happen in japan exactly because people are sitting around doing nothing without acting even when it's urgent to make decision. Backstabbing and toxicity is the major feature of japanese business culture

armada651 · 2025-11-06T05:24:41 1762406681

Move fast and break stuff didn't work out much better though.

rtpg · 2025-11-06T05:28:39 1762406919

Yeah sure, I feel like back channeling stuff is generally just the respectful thing to do, so I'm not on the side of the debate I'm expanding upon in most cases.

Just that lacking context one really can't make that many blanket statements.

ivell · 2025-11-06T07:10:16 1762413016

Not only respectful. Also it ensures that all different aspects of a decision is considered before making the decision. If not aligned with all parties, we would miss important flaws in the plan. It is just a sensible thing to do.

rtpg · 2025-11-06T07:26:20 1762413980

> If not aligned with all parties, we would miss important flaws in the plan.

I think the difficult cases come when people's interests aren't aligned. If you're coordinating with a vendor to basically detangle yourself from their vendor-specific tooling to be able to move away from them, at some level it doesn't really make sense to read them in on that.

There are degrees to this, and I think you can argue both sides here (so ultimately it's a question of what you want to do), but parties are rarely neutral. So the tough discussions come from ones where one party is going to be losing out on something.

stackedinserter · 2025-11-06T16:52:41 1762447961

But it does work much better, why do you think it didn't?

pezezin · 2025-11-07T11:42:14 1762515734

Yeah, Japanese economy has been quite stagnant since the bubble collapse in 1990. Pretending that their model works is ridiculous.

ekianjo · 2025-11-06T10:00:56 1762423256

You have more than 2 choices

ssivark · 2025-11-06T05:58:33 1762408713

Peter Drucker has an interesting analysis of the "American" -vs- "Japanese" styles of decision-making + alignment, presenting a complementary perspective: https://www.joaomordomo.com/files/books/ebooks/Peter%20Druck...

IMHO the only correct way to measure the effectiveness of decision making is from the quality of executed outcomes. It is somewhat nonsensical to sever decisions from execution, and claim that decisions have been made rapidly if the decision doesn't lend itself to crisp execution. Without that, decisions are merely intentions.

palmotea · 2025-11-06T07:18:28 1762413508

> long and slow consensus building that weighs existing stakeholder's opinions heavily vs doing "the right thing" from the outset.

How do you know what "the right thing" is at the outset without talking to the stakeholders?

I'm dealing with someone's "the right thing" that is actually wrong and dumb. They didn't ask us before rolling out the new "standard."

rtpg · 2025-11-06T07:21:57 1762413717

Some people are very confident in their understanding of a problem! Others will discount the validity of the stakeholders involved having good judgement.

I think most people have at least one issue where they discount one of the stakeholder's judgement, it's all fairly contextual. But hey, if you're the CEO of some company you have the ability to act on that discounting.

jack1243star · 2025-11-06T04:33:07 1762403587

Not OP but the phrase in Japanese also carries a negative connotation, that important issues are decided by a shadow process hidden below the surface, beforehand by those in the loop. Meetings are just for show.

ekianjo · 2025-11-06T13:09:21 1762434561

would take a long time to expand on this in sufficient details, because it's hard to understand unless you live it and see it. People very rarely speak up in meetings (silence is the norm, just like they were taught in school), and therefore you never get to see exactly what people feel or think about what you are planning to do. So the norm is that most people talk and exchange in other venues (usually in your back, in smaller groups - I don't mean this in a negative way, it's just what they are used to do). So when you are set to achieve something, it's a long road of 1:1s, drinks or informal meetings to get to know what they think, what they want, and how they can help you to get there. Multiply this by the number of stakeholders involved, and the politics at play between all of them, and you end up with a very slow and inefficient model to drive decisions and buy-in.

pezezin · 2025-11-07T12:48:50 1762519730

I also live in Japan and everything you wrote is true, but I would add more point: extreme risk aversion that results in any tiny change getting discussed to ridiculous levels.

agnishom · 2025-11-07T01:13:32 1762478012

Isn't "nemawashi" just a term for building rapport and sensible networking practices, but in Japanese?

trallnag · 2025-11-06T13:03:07 1762434187

"I love Japan for being human". What does this even mean? Immediately followed by something about food?

pezezin · 2025-11-07T11:36:08 1762515368

Be glad he didn't mention that Japan has four seasons...

jesterson · 2025-11-06T07:12:17 1762413137

> Not everything should be streamlined for a quick call solution

If you have a better solution to correct an error or solve a problem than having a call/meeting and openly discuss situation and possible resolutions - I would love to know about that.

port11 · 2025-11-06T07:47:51 1762415271

The response was condescending and very… American. The call ensures what, that you'll be more receiving to their grievances? That nothing is on the record? A lot of people don't want to jump in calls, ever. The initial response should've validated that the community feels slighted, that they should've brought them onboard for the decision making, etc.

Acknowledging the mistake immediately seems like a good start.

latexr · 2025-11-06T10:08:47 1762423727

> The call ensures what, that you'll be more receiving to their grievances?

It ensures you truly understand what the crux of the grievance is and what they would like to happen to get it resolved, instead of being distracted by tangential points.

> That nothing is on the record?

If you’re already assuming malice before the resolution process even had a chance to begin, the conversation has little chance of being productive. Do you know this particular person? Have you interacted with them before?

> A lot of people don't want to jump in calls, ever.

Then say no! But being preemptively mad because someone asked is absurd and does nothing to fix the problem. The asker shouldn’t assume what the other person wants or doesn’t, they should ask. Which is what they did.

> Acknowledging the mistake immediately seems like a good start.

Yes, very much agreed. But you can’t take back what you did, only try to make amends. And that’s very difficult if the other party demands perfection while you’re still even trying to understand the situation.

Almondsetat · 2025-11-06T10:58:13 1762426693

>the response was condescending

in your own opinion

>and very American

from an American company? that's what I'd expect. Should they have brought up some Japanese PR consultant just to reply to a community post?

>acknowledging the mistake immediately

Who says a mistake happened? You? Before apologizing maybe we should understand the problem?

fragmede · 2025-11-06T11:19:08 1762427948

> Should they have brought up some Japanese PR consultant just to reply to a community post?

Yes! It wouldn't even have had to have been a good one to have done a better job. Shit, just find the closest weeb and run it past them.

A developer relations person needs to understand developers so why shouldn't we expect the community person to understand the community they're interacting with?

Mozilla doesn't have the community goodwill to burn, it's hanging on by a thread - so not hiring someone with an idea if how to actually do that job would be penny wise pound foolish.

jesterson · 2025-11-06T08:10:28 1762416628

Ok, how the perfect reaction would be if you were at charge?

I understand people have sympathy inclination to victims, so everyone would assume the victim is good and other side is bad. I have worked long enough with japanese people knowing they can throw unpredictable tantrums.

As a manager, what would be your best course of action to deal with similar situation?

port11 · 2025-11-06T08:20:26 1762417226

Acknowledging the mistake immediately seems like a good start, as I've said.

Life doesn't always have to be from the perspective from “a manager”, these are community volunteers doing untold hours of unpaid work. Just be a person, whose acquaintance is upset you replaced their handmade postcard with an AI-generated one.

jesterson · 2025-11-06T08:41:27 1762418487

Acknowledging a mistake, no matter genuinely or not, doesn't solve the situation. It just makes victim feel good a bit.

Agree on manager view, I was rather putting situation in a wrong perspective. It doesn't change the questions though - what would you do to resolve the situation (not to make the other side feel good)?

port11 · 2025-11-07T11:41:56 1762515716

> Acknowledging a mistake, no matter genuinely or not, doesn't solve the situation. It just makes victim feel good a bit.

This feels very wrong to me, I'm sorry, but I'd be very pissed if you told me such a thing in a personal context. Reminds of Stanley from The Office, who claims he never apologised to any of his wives.

Jach · 2025-11-06T10:50:18 1762426218

Written communication is usually better and allows for more clarity, investigation, preparation, careful thought, and exploring of solutions. When it's not better it's usually because one party doesn't like to read or write and so avoids it as much as they can.

skywhopper · 2025-11-06T19:50:37 1762458637

How is a private call about a community issue an “open” discussion?

watwut · 2025-11-06T10:38:19 1762425499

> If you have a better solution to correct an error or solve a problem than having a call/meeting and openly discuss situation and possible resolutions - I would love to know about that.

I do, actually. You first read what the other person wrote. Then your response will take whatever they wrote into account. If they did not expressed themselves clearly, you explain what it is that you do not understand. The "We want to make sure we truly understand what you're struggling with." is wholly inappropriate if the only reason you do not understand is that you did not read what they wrote.

Second, you dont suggest the other person is struggling with something, unless they are actually struggling with something. The original post does not show someone struggling at all.

Tl;dr if you want to "openly discuss situation and possible resolutions" you dont start by ignoring what the other person wrote. This response makes it very clear that manager does not intend to openly discuss the situation or possible resolutions, the manager is not taking the complaint seriously at all.

ezoe · 2025-11-06T04:07:16 1762402036

Exactly, this is just a 面子(face) problem.

Also, his demanding of not using his work for AI training is nonsense. Because entire articles, this one included is published under a Creative Commons license.

Didn't he agree on that?

Mozilla must reject his further contribution because he stated he don't understand the term of Creative Commons license. His wish granted I guess.

wartywhoa23 · 2025-11-06T08:57:01 1762419421

Creative Commons License was created without any AI in mind.

And

> Licensees may copy, distribute, display, perform and make derivative works and remixes based on it only if they GIVE THE AUTHOR or licensor THE CREDITS

ezoe · 2025-11-06T13:57:21 1762437441

Is an ML model binary file created by using copyrighted work as its learning data, a derivative work of the copyrighted work? I don't think so.

wartywhoa23 · 2025-11-06T14:30:12 1762439412

If one takes another's work, cuts it up and makes collages out of it, however multidimensional, what is the piece size threshold that makes the collage non-derivative?

umanwizard · 2025-11-06T16:29:12 1762446552

This is the most fundamentally important question of AI-related law, and nobody knows the answer as it hasn't been tested by any court AFAIK (at least not in the US).

ghssds · 2025-11-06T19:08:39 1762456119

We already know the tribunal will take AI's side, not because of any justice or ethical reason but because of capitalism. They will interpret the law as saying whatever they want the law to say.

kuschku · 2025-11-06T08:52:52 1762419172

Is the AI published under the same CC license, with attribution?

ezoe · 2025-11-06T13:58:49 1762437529

CC only works on things that are copyright protected works. Is ML model binary file a derivative works of the learning source? I don't think so.

AlexandrB · 2025-11-06T16:29:10 1762446550

Why not?

ezoe · 2025-11-07T07:59:36 1762502376

Because it doesn't.

The Japanese copyright law clearly stated decades ago and recent US court favors Anthropic on this regard.

Copyright isn't granted on mere information or thought.

If you take somebody's copyrighted writing, analyze it and publish information such as how many words or sentence in it or other information about that copyrighted work, that's not a derivative works of original copyrighted work.

adeptima · 2025-10-15T00:50:54 1760489454

Did research on accent, pronunciation improvement, phoneme recognition, kaldi ecosystem, etc … nothing really changed in the public domain past few years. There’s no even accurate open source dataset. All self claimedccc manually labelled dataset with 10k+ hours was partly done with automation. Next issue, model models operates in different latent space often with 50ms chunks while pronunciation assessment requires much better accuracy. Just try to say B loud - silent part gathering energy in the lips, loud part, and everything what resonates after. Worst part there are too many ml papers from the last year students or junior phd folks claiming success or fake improvements, etc

The article itself is just a vector projection in 3d space … the actual reality is much complex.

Any comments on pronunciation assessment models are greatly appreciated

oezi · 2025-10-15T05:40:19 1760506819

You are right and I don't think incentives exist to solve the issues you describe, because currently many of the building blocks people are building are aligned to erase subtleaccent differences: the neural codecs, transcription systems such as whisper want to output clean/compressed representations of their inputs.

adeptima · 2025-10-15T22:37:28 1760567848

100% agree

adeptima · 2025-09-12T20:11:37 1757707897

QGIS is a gold standard to verify you tools works fine and data is in a correct format ...

if you are a web based first, you have even better options to build and extend

kepler, protomaps, maplibre-gl-js

https://kepler.gl

https://protomaps.com

https://github.com/maplibre/maplibre-gl-js

the rest can be found on great Qiusheng Wu’s (aka @giswqs) Geo/GeoAI tutorials channels and repos

https://www.youtube.com/@giswqs/videos

https://x.com/giswqs

but what really amazed me is how geo spatial support is growing inside of databases recently

https://duckdb.org/docs/stable/core_extensions/spatial/overv...

all mighty postgis https://postgis.net/docs/manual-3.5/postgis_cheatsheet-en.ht...

https://sedona.apache.org/latest/

https://geoparquet.org/releases/v1.0.0/

and many unlocked dataset compare to other industries

https://docs.overturemaps.org/getting-data/duckdb/

https://www.openstreetmap.org/

https://hub.arcgis.com/search

lot great webtools are comming for sure and you still can be 100% of most of your geospatial pipeline

p.s. want to extend the above list with self-hosted tools with minimum or none dependencies on paid APIs, and recommendations are greatly appreciated

adeptima · 2025-08-10T00:44:07 1754786647

miss it heavily! it could read code dumps and was superior for code analysis and todos

adeptima · 2025-08-10T00:42:29 1754786549

checked my network - no one is using GPT 5 Pro ...

any feedback is greatly appreciated!!! especially comparing with o3

mikert89 · 2025-08-10T00:58:03 1754787483

I'm surprised how few people are using it

energy123 · 2025-08-10T02:02:59 1754791379

Are there tight rate limits to GPT-5 Pro or is it in practice uncapped as long as you're not abusive?

Is GPT-5 better than GPT-5 Pro for any tasks?

adeptima · 2025-08-10T00:40:00 1754786400

same sentiments with an article author - gpt5 looks like a cost-cut initiative.

my personal feeling gpt5-thinking is much faster but doesnt produce the same quality results as o3 which were capable to scan through the code base dump with file names and make correct calls

dont feel any changes with https://chatgpt.com/codex/

my best experience was to use o3 for task analysis, copy paste the result in https://chatgpt.com/codex/, work outside and vibe code from mobile

adeptima · 2025-07-04T18:50:57 1751655057

Foursquare has another open source project worth noting on DuckDB - SQLRooms

https://sqlrooms.org/

“Build data-centric apps with DuckDB An Open Source React Framework for Single-Node Data Analytics powered by DuckDB”

adeptima · 2025-05-07T20:09:41 1746648581

this guy .....

vel0city · 2025-05-13T18:45:41 1747161941

THIS GUY FUCKS!

adeptima · 2025-04-14T14:56:30 1744642590

Meilisearch is great, used it for a quick demo

However if you need a full-text search similar to Apache Lucene, my go-to options are based on Tantivy

Tantivy https://github.com/quickwit-oss/tantivy

Asian language, BM25 scoring, Natural query language, JSON fields indexing support are all must-have features for me

Quickwit - https://github.com/quickwit-oss/quickwit - https://quickwit.io/docs/get-started/quickstart

ParadeDB - https://github.com/paradedb/paradedb

I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

Any thoughts on up-to-date hybrid search experience are greatly appreciated

jitl · 2025-04-14T15:26:58 1744644418

Quickwit was bought by Datadog, so I feel there's some risk quickwit-oss becomes unmaintained if Datadog's corporate priority shifts in the future, or OSS maintenance stops providing return on investment. Based on the Quickwit blog post, they are relicensing to Apache2 and releasing some enterprise features, so it seems very possible the original maintainers will move to other things, and it's unclear if enough community would coalesce to keep the project moving forward.

https://quickwit.io/blog/quickwit-joins-datadog#the-journey-...

iambateman · 2025-04-14T15:55:37 1744646137

I have an implementation of Quickwit, so I've thought about this.

The latest version is stable and fast enough, that I think this won't be an issue for a while. It's the kind of thing that does what it needs to do, at least for me.

But I totally agree that the project is at risk, given the acquisition.

kk3 · 2025-04-14T16:43:46 1744649026

As far as combining full-text search with embedding vectors goes, Typesense has been building features around that - https://typesense.org/docs/28.0/api/vector-search.html

I haven't tried those features but I did try Meilisearch awhile back and I found Typesense to index much faster (which was a bottleneck for my particular use case) and also have many more features to control search/ranking. Although just to say, my use case was not typical for search and I'm sure Meilisearch has come a long way since then, so this is not to speak poorly of Meilisearch, just that Typesense is another great option.

Kerollmops · 2025-04-14T17:57:02 1744653422

Meilisearch just improved the indexing speed and simplified the update path. We released v1.12 and highly improved indexing speed [1]. We improved the upgrade path with the dumpless upgrade feature [2].

The main advantage of Meilisearch is that the content is written to disk. Rebooting an instance is instant, and that's quite useful when booting from a snapshot or upgrading to a smaller or larger machine. We think disk-first is a great approach as the user doesn't fear reindexing when restarting the program.

That's where Meilisearch's dumpless upgrade is excellent: all the content you've previously indexed is still written to disk and slightly modified to be compatible with the latest engine version. This differs from Typesense, where upgrades necessitate reindexing the documents in memory. I don't know about embeddings. Do you have to query OpenAI again when upgrading? Meilisearch keeps the embeddings on disk to avoid costs and remove the indexing time.

[1]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1... [2]: https://github.com/meilisearch/meilisearch/releases/tag/v1.1...

kk3 · 2025-04-15T04:15:56 1744690556

Thank you for the response here. Not being able to upgrade the machine without completely re-indexing has actually become a huge issue for me. My use case is that I need to upgrade the machine to perform a big indexing operation that happens all at once and then after that reduce the machine resources. Typesense has future plans to persist the index to disk but it's not on the road map yet. And with the indexing improvements, Meilisearch may be a viable option for my use case now. I'll be checking this out!

irevoire · 2025-04-14T19:07:46 1744657666

I hate the way typesense are doing their « hybrid search ». It’s called fusion search and the idea is that you have no idea of how well the semantic and full text search are being doing, so you’re going to randomly mix them together without looking at all at the results both searches are returning.

I tried to explain them in an issue that in this state it was pretty much useless because you would always have one or the other search strategy that would give you awful results, but they basically said « some other engine are doing that as well so we won’t try to improve it » + a ton a justification instead of just admitting that this strategy is bad.

jabo · 2025-04-14T20:16:34 1744661794

We generally tend to engage in in-depth conversations with our users.

But in this case, when you opened the GitHub issue, we noticed that you’re part of the Meilisearch team, so we didn’t want to spend too much time explaining something in-depth to someone who was just doing competitive research, when we could have instead spent that time helping other Typesense users. Which is why the response to you might have seemed brief.

For what it’s worth, the approach used in Typesense is called Reciprocal Rank Fusion (RRF) and it’s a well researched topic that has a bunch of academic papers published on it. So it’s best to read those papers to understand the tradeoffs involved.

irevoire · 2025-04-14T20:29:17 1744662557

> But in this case, when you opened the GitHub issue, we noticed that you’re part of the Meilisearch team, so we didn’t want to spend too much time explaining something in-depth to someone who was just doing competitive research, when we could have instead spent that time helping other Typesense users. Which is why the response to you might have seemed brief.

Well, in this case I was just trying to be a normal user that want the best relevancy possible and couldn’t find a solution. But the reason why I couldn’t find it was not because you didn’t want to spend more time on my case, it was because typesense provide no solution to this problem.

> it’s a well researched topic that has a bunch of academic papers published on it. So it’s best to read those papers to understand the tradeoffs involved.

Yeah, cool or in other word « it’s bad, we know it and we can’t help you, but it’s the state of the art, you should instruct yourself ». But guess what, meilisearch may need some fine-tuning around your model etc, but in the end it gives you the tool to make a proper hybrid search that knows the quality of the results before mixing them.

If other people want to see the original issue: https://github.com/typesense/typesense/issues/1964

spiderfarmer · 2025-04-14T20:42:30 1744663350

I think this is a good example of why people should disclose their background when commenting on competing products/projects. Even if the intentions were sound, which seems to be the case here, upfront disclosure would have given the conversation more weight and meaning.

jimmydoe · 2025-04-15T16:55:46 1744736146

+1 typesense is really fast. the only drawback is starting up is slow when index getting larger. the good thing is full text search (excl vector) is relatively stable feature set, so if your use case is just FTS, you won't need to restart very often for version upgrade.

inertiatic · 2025-04-14T15:42:31 1744645351

>I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

Start off with ES or Vespa, probably. ES is not hard at all to get started with, IMO.

Try RRF - see how far that gets you for your use case. If it's not where you want to be, time to get thinking about what you're trying to do. Maybe a score multiplication gets you where you want to be - you can do it in Vespa I think, but you have to hack around the inability to express exactly that in ES.

navaed01 · 2025-04-15T02:06:10 1744682770

I’m using Typesense hybrid search, it does the job, well priced and is low-effort to implement. Feel free to ask any specific questions

Kerollmops · 2025-04-15T07:08:36 1744700916

You should try Meilisearch then, you'll be astonished by the quality of the results and the ease of setup.

yencabulator · 2025-04-15T20:21:37 1744748497

https://news.ycombinator.com/user?id=Kerollmops

> Meilisearch Co-Founder and Tech Lead.

You really should disclose your affiliation.

Epicism · 2025-04-16T12:36:17 1744806977

Try LanceDB https://github.com/lancedb/lancedb

It’s based off of the data fusion engine, has vector indexing and BM 25 indexing, has pipes on and rust bindings

Kerollmops · 2025-04-14T16:32:39 1744648359

> I'm still looking for a systematic approach to make a hybrid search (combined full-text with embedding vectors).

You know that Meilisearch is the way to go, right? Tantivy, even though, I love the product, doesn't support vector search. Its Hybrid search is stunningly good. You can try it on our demo [1].

[1]: https://wheretowatch.meilisearch.com/

oulipo · 2025-04-14T19:16:36 1744658196

why couldn't it be possible to just embed Meilisearch/Tantivy/Quickwit inside Postgres as a plugin to simplify the setup?

Kerollmops · 2025-04-14T20:01:11 1744660871

> [..] to simplify the setup?

It would be simpler to keep Meilisearch and its key-value store out of Postgres' WAL and stuff and better propose a good SQL exporter (in the plan).

oulipo · 2025-04-15T08:58:41 1744707521

Perhaps on a technical level, but for a dev, if I just need to install Postgres and some plugins and, boom, I have a full searchable index, it's even easier

adeptima · 2025-04-09T05:06:42 1744175202

Look at Superset chart implementations and component choice all the time.