More

bomewish · 2025-04-03T22:23:02 1743718982

Why not fix all the broken doc links and make sure you have the full sdk spec down first, ready to go? Then drop it all at once, when it’s actually ready. That’s better and more respectful of users. I love the product and want y’all to succeed but this came off as extremely unprofessional.

abelanger · 2025-04-03T22:38:03 1743719883

Really appreciate the candid feedback, and glad to hear you like the product. We ran a broken links checker against our docs, but it's possible we missed something. Is there anywhere you're seeing a broken link?

Re SDK specs -- I assume you mean full SDK API references? We're nearly at the point where those will be published, and I agree that they would be incredibly useful.

bomewish · 2025-04-02T08:20:54 1743582054

Initial reactions.

1) Let us keep the right sidebar permanently out, and DON'T grey out the rest of the screen. I want to be able to click on target language words and immediately see them. Like, you've given us the translated sentence, but I can't see which word is which;

2) Colour _the same words_ in both languages when doing mouseover;

3) Or just highlight BOTH as we're listening [but note issue below!];

4) Make the keyboard use a bit more intuitive - i.e. left/right obviously means "go back or forward in the video/audio", but now I have to CLICK on the yt video again to get that behavior. It should be auto so I don't have to do that. Similarly, I want to click on a word to know it's meaning, but then go back to space->pause behavior. Rn clicking a word breaks that. Just adds friction to users.

5) Consider yt-dlp to save the videos so if we are studying one, and yt pulls it, we can keep using. Maybe for the roadmap.

6) Consider allowing us to add words to vocab -- and which vocab -- directly from mouseover [without clogging up UI - not sure there]. Right now it's a bit convoluted [right sidebar, which again should be permanent and integrated, not greying out the main screen - but even if that was fixed, that's a lot of mouse movement]

6) Handle idiomatic language issues better. You'll probably need another LLM pass/method for this, but IT'S a BIG ONE! Languages don't map 1:1 obviously, so for example this one:

https://app.fluentsubs.com/stream?v=cm8mnqrqe084ervb0mi6a4sa...

"genommen" was translated as "taken" <- means nothing.

I dump into 4o and it explains

In the phrase „genau genommen“, the word „genommen“ is part of a fixed idiomatic expression and doesn't translate literally as "taken."

„Genau genommen“ means "strictly speaking" or "to be precise."

So the full sentence:

„Wir sind heute wieder auf der Straße unterwegs, genau genommen auf dem Flohmarkt…“

translates to:

"Today we're out on the street again — strictly speaking, at the flea market…"

It’s specifying or narrowing down what “on the street” means in this context.

**

So you'll need to pull out these idiomatic phrases and then make sure they can be analysed as a single unit, so to speak. Learners are gonna have to be acquainted with those, and now the workflow is obviously broken.

Basically just get a model to bundle them and then in the sidebar on the right that has like "drill into X" you've got the PHRASE as a unit of analysis.

ph4evers · 2025-04-03T17:28:06 1743701286

Thanks for all the feedback! Really appreciated!

1. Makes sense! 2./3. That's a bit hard, but like point 6 I think it is possible to map certain parts. 4. Makes sense 5. I put it on the roadmap but I think it is not so much of a priority now. I want to have an offline mode at a certain point (as well as a dedicated app) 6. Yes, this is hard and expensive. But I think that I should have a high quality section with proper quality control. I have some ideas to quickly create lessons as a teacher, but right now I'm mainly firefighting stability and quality

Thanks again for the extensive write-up

bomewish · 2025-04-03T18:22:20 1743704540

For 2/3 — isn’t it just another api call to get the mappings [solving 6 as a side effect] then somehow wiring it up to the frontend like you already do?

The sidebar greying out the foreground now and not able to stay locked REALLY breaks flow. Fixing that slightly mitigates.

It’s amazing tho and I’ll subscribe soon enough.

bomewish · 2025-04-02T07:44:02 1743579842

I'd straight up pay 5 or 10 bucks a month for this if it was like... 200% more functional/featured/professional/working. VERY good proof of concept and I love it. Target language is german fwiw.

ph4evers · 2025-04-03T17:20:35 1743700835

Thanks for the feedback! It is indeed a PoC that I'm hacking together after work.

I've been working hard to get the quality up. And now that I have some paid users for the large languages I can also auto-transcribe high quality channels. Main reason for the poor exercises (especially for German) was that I initially picked some poor channels and I was being cheap.

I've updated the german channel and that should hopefully result in a better experience.

bomewish · 2025-03-19T17:10:53 1742404253

No one has mentioned firecawl? Can anyone compare to archivebox, httrack?

bomewish · 2025-03-17T20:51:23 1742244683

What’s the second level analysis here? We know it’s not really necessary or helpful for the ostensible reason (far cheaper/more reliable ways of capturing entropy) — so we conclude it’s a marketing gimmick. Yet for the gimmick to work they have to pretend it’s useful. They’re not fooling themselves or anyone else, though.

So what’s really going on?

Is it:

- it IS somehow a good return on investment??

- marketing had a budget and didn’t know how else to spend it, and no one wanted to be the unpleasant person and say how it’s all a silly waste of money?

- they are making a tonne of money and no one really cares, so we’ll just spend it on fun cool stuff as long as there’s a plausibleish story to go with it?

- fits with a broader global company branding concept that leadership seems to like, so there’s just the momentum to keep it going (and see points above)?

I can’t figure it out. I agree it’s cool! Just the make believe puzzles me a little. I’ve not worked at a big corp like this and just have to understand what’s actually happening.

eastdakota · 2025-03-17T22:00:14 1742248814

If it were merely marketing spend for customer acquisition, I bet the ROI on the lava lamp wall in SF has been 100,000x. This isn’t hard to figure out.

babelfish · 2025-03-17T21:23:02 1742246582

It is a gimmick, but targeted at recruiting, not sales. Cloudflare is known to be “blog driven engineering”

everybodyknows · 2025-03-17T23:04:27 1742252667

Not #3: Trailing 12 month P/E is negative.

https://www.cnbc.com/quotes/NET?qsearchterm=net

bomewish · 2025-03-17T20:46:45 1742244405

Presumably nostalgia —- I’m certainly feeling it!

bomewish · 2025-01-11T09:14:37 1736586877

Practically, what happens if the founders don’t want to play ball? VCs can muscle up and force a sale on worse terms? Just not clear how much of what goes on is asking nicely vs coercion, and where the actual leverage lies.

maccard · 2025-01-11T14:52:42 1736607162

It’s going to depend on what the terms you agreed to are. In my experience, the company leadership are free to suggest an alternative to the hyper growth plan, but the board (which you likely won’t have a majority on) will reject it. VC’s don’t care about recouping money, they’d rather you fired everyone, pivoted, and started again and tried again with the money you have. They want successful exits, not break evens. It’s not about money it’s about reputation.

If you suggest something they don’t like, they’ll stall until you run out of money, or force you out.

bomewish · 2024-12-20T06:10:39 1734675039

I can't find any info on whether ModernBERT will handle languages other than English; German, Chinese, Arabic? Any info there would be super helpful.

authorfly · 2024-12-20T14:03:16 1734703396

Probably a multilingual version will be needed, like with BERT and RoBERTa. I should hasten to add for multi language tasks(beyond detection), either simpler methods for tasks like multiple language classification/prediction(e.g. word frequency, BERTopic like approaches or SVMs) or LLMs are generally a better candidate.

There are a couple of reasons.. 1) That size (even for the large) is too much for multiple languages with good BLEU scores. 2) Encoder and decoder models don't tend to get trained for translation as much as e.g. GPT models with large translation texts in their datasets across multiple languages (with exceptions such as T5 translation task).

bomewish · 2024-12-20T21:04:21 1734728661

Looking to do super fast embeddings, basically. A few chinese teams seem to have produced some BERT variants so I’ll look there.

bomewish · 2024-12-11T13:03:36 1733922216

Wouldn’t hetzner bare metal allow you to do that an order of magnitude cheaper ??

ashekhawat · 2024-12-11T15:44:33 1733931873

We will definitely look into it! As we are adding more users scalability is our primary concern right now though

bomewish · 2024-11-21T00:03:58 1732147438

I mean, all secondary sources are (by definition) based in some way on primary sources. So the advice is necessarily epistemologically incomplete.