More

riquito · 2025-09-05T22:05:11 1757109911

I think he implies that because one can borrow hypothetically any book for free from a library, one could use them for legal training purposes, so the requirement of having your own copy should be moot

jazzyjackson · 2025-09-05T22:27:18 1757111238

Libraries aren’t just anarchist free for alls they are operating under licensing terms. Google had a big squabble with the university of Illinois Urbana Champaign research library before finally getting permission to scan the books there. Guess what, Google has the full text but books.google.com only shows previews, why is an exercise to the reader literally

gpm · 2025-09-05T22:31:00 1757111460

Libraries are neither anarchist free for alls nor are they operating under licensing terms with regards to physical books.

They're merely doing what anyone is allowed to with the books that they own, loaning them out, because copyright law doesn't prohibit that, so no license is needed.

lotsoweiners · 2025-09-05T22:53:39 1757112819

Yup. And if Anthropic CEO or whoever wants to drive down to the library and check out 30 books (or whatever the limit is), scan them, and then return them that is their prerogative I guess.

mdp2021 · 2025-09-05T23:00:34 1757113234

Scanning (copying) is¹ not allowed. Reading is.

What is in a library, you can freely read. Find the most appropriate way. You do not need to have bought the book.

¹(Edit: or /may/ not be allowed, see posts below.)

gpm · 2025-09-05T23:06:37 1757113597

Scanning is, under the right circumstances, allowed in the US, at least per the Second Circuit appeals court (Connecticut, New York, Vermont): https://en.wikipedia.org/wiki/Authors_Guild%2C_Inc._v._Googl....

rvnx · 2025-09-06T08:32:00 1757147520

They (OpenAI and Anthropic) operate their platform and distributes these copyrighted works outside, where these foreign laws applies

jrockway · 2025-09-05T23:02:49 1757113369

There are no terms and conditions attached to library books beyond copyright law (which says nothing about scanning) and the general premise of being a library (return the book in good condition on time or pay).

mdp2021 · 2025-09-05T23:13:34 1757114014

Copyright law in the USA may be more liberal about scanning than other jurisdictions (see the parallel comment from gpm), which expressly regulate the amount of copying of material you do not own as an item.

gpm · 2025-09-05T23:19:59 1757114399

The jurisdictions I'm familiar with all give vague fair use/fair dealing exceptions which would cover some but not all copying (including scanning) with less than clear boundaries.

I'd be interested to know if you knew of one with bright line rules delineating what is and isn't allowed.

mdp2021 · 2025-09-06T09:07:34 1757149654

> if you knew of one with bright line rules

(I know by practice but not from the letter of the law; to give you details I should do some research and it will take time - if I will manage to I will send you an email, but I doubt I will be able to do it soon. The focus is anyway on western European Countries.)

bandrami · 2025-09-06T03:09:14 1757128154

Scanning in a way that results in a copy of the book being saved is a right reserved to the holder of the copyright

kjkjadksj · 2025-09-05T22:51:08 1757112668

Afaik to scan a book you need to destroy it by cutting the spine so it can feed cleanly into the scanner. Would incur a lot of fines.

wizzwizz4 · 2025-09-05T22:56:51 1757113011

Nah, that's just if you want archival-quality scans. "Good enough for OCR" is a much lower bar.

mkagenius · 2025-09-06T00:56:12 1757120172

Anthropic hired the books scanning guy from Google for 1M+ usd to do just that (open the binds).

mkagenius · 2025-09-06T00:53:54 1757120034

That's what they did. They also destroyed books worth millions in the process.

They didn't think it would be a good idea to re-bind them and distribute it to the library or someone in need.

nl · 2025-09-06T01:39:21 1757122761

To be clear, they destructively scanned millions of books which in total were worth millions of dollars.

They did not destroy old, valuable books which individually were worth millions.

https://arstechnica.com/ai/2025/06/anthropic-destroyed-milli...

xp84 · 2025-09-06T22:55:21 1757199321

I really don’t think there’s any demand out there for re-bound used paper books when most books can be had in their real binding for $3 or less. It would cost at least $3 to re-bind, then they’d have to be listed on Amazon marketplace in “Poor condition” where they’d be valued at maybe $0.50 and cost $3 to ship, and they’d take years of warehousing at great expense waiting to be sold.

As for needy people, they already have libraries and an endless stream of books being donated to thrift stores. Nothing of value was lost here.

mkagenius · 2025-09-07T01:55:46 1757210146

> Nothing of value was lost here

But then they shouldn't have done that in the first place. It seems like a crime to destroy so many books.

Imagine, 10 more companies come to join the AI race and decide to do the same.

kjkjadksj · 2025-09-07T21:19:54 1757279994

To be fair, a book is fundamentally a wear item. I remember learning how my university library had its own incinerator. After a certain point it makes no sense to have 30 copies of an outdated textbook taking up space in the racks. Same goes for beatup old fiction and what have you. One might think a little urban school or branch library might want some but they too deal with realities of shelf space constraints and would probably prefer that their patrons had materials more current or in better shape.

That being said, I’m sure these companies did not exclusively buy books at the end of their life.

xp84 · 2025-09-07T16:10:25 1757261425

Books are printed in very large quantities, and there isn't infinite warehousing space for them "just in case." Surplus books just get sent straight to recycling all the time to make room for new books. I would be surprised if while this project was running, it represented even 10% of the daily books being destroyed. It's just never been practical to save every book printed forever.

ijk · 2025-09-06T06:55:09 1757141709

There are book scanners that don't require cutting the spine, though Anthropic doesn't seem to have used that approach.

riquito · 2025-09-03T16:53:29 1756918409

Probably not even the best ones, but among some recent models I find Dia and Orpheus more natural

- http://dia-tts.com/

- https://github.com/canopyai/Orpheus-TTS

riquito · 2025-08-09T15:07:21 1754752041

Closed source, without 3rd party independent review and people should just trust you? As if your app cannot start sending data away in a month or attempt to detect monitoring software, to name a couple

riquito · 2025-07-23T15:20:28 1753284028

> no other PC or mobile phone manufacturer is providing warranty service (for consumer hardware) that remotely matches Apple's.

Maybe, but Apple is also among the worst companies for repairability of their hardware. If a PC (which you mention) breaks is usually only one part to be replaced (without looking at actual repairs), and any individual with necessary know-how can do it

otterley · 2025-07-23T15:36:34 1753284994

Those are two separate issues. The claim is that you’re paying for Apple to repair or replace your goods if a problem arises, and they’re more than capable of doing that.

riquito · 2025-05-07T14:49:36 1746629376

Not that people are obligated to use IntelliJ IDEs, but it's sad that it boils down to "You can have privacy if you can afford it". But admittedly is better to have the option to use it than not being able to use it at all

atemerev · 2025-05-07T14:56:38 1746629798

Their telemetry promises not to collect private data. Yes, your code will probably used for training their models. But so it would be if you publish it on GitHub.

ndriscoll · 2025-05-07T16:22:44 1746634964

All data on my computer is private unless I specifically make it public. Data thieves like to make a rhetorical sleight of hand where they say they're not really collecting data about you (this happens especially with the topic of "differential privacy"), but that's just gaslighting (i.e. trying to manipulate you into thinking you simply don't understand what they are doing). e.g. I'm not willing to share noisy correlations about my preferences either. That information is private, and it is information, or they wouldn't want it.

atemerev · 2025-05-07T16:41:50 1746636110

Then you are free to not use their product.

My privacy is indeed differential. I am willing to give them the information on my coding patterns and even non-commercial code for a free license, if it is not linked to my identity. This is a fair exchange. I am not willing to do this if they will use this information to sell me ads, or sell it (unless properly anonymized) to some other company. And most certainly I won't agree if they collect any information beyond what's happen in the IDE.

Not _everything_ I do on my computer is fully private. I apply much stricter standards to things that are _really_ private. But not everything is like this.

This comment is public. It will probably be used to train yet another LLM. I am fine with that.

ndriscoll · 2025-05-07T16:46:42 1746636402

Sure, but as I said elsewhere, it's still important that people point out that past the headline ("it's free") is that it is also malware (it spies on you). People were free to not use BonziBuddy as well, but it was rightfully characterized at the time as spyware. If the product also functioned as a proxy for botnet traffic, you wouldn't simply say "well you're free to not use it". You'd say "beware, the 'free' version is malware". Spyware is similar.

Posting to a public online forum is of course specifically making the post public.

riquito · 2025-05-07T14:45:40 1746629140

> > It’s important to note that, if you’re using a non-commercial license, you cannot opt out of the collection of anonymous usage statistics. We use this information to improve our products.

> Well, it's basically true for MS-branded VSCode too. I now use VSCodium.

How's that "basically true"? That's false. You can opt out. In fact there's very good documentation around that

https://code.visualstudio.com/docs/configure/telemetry

hyper57 · 2025-05-07T15:34:59 1746632099

According to Microsoft's own license terms for VS Code, you can't opt out of all telemetry; see Section 2a: https://code.visualstudio.com/license

> You may opt-out of many of these scenarios, but not all, as described in the product documentation located at https://code.visualstudio.com/docs/supporting/faq#_how-to-di....

Also, each extension (including Microsoft's) may collect its own telemetry. The blog post https://www.roboleary.net/tools/2022/04/20/vscode-telemetry has more details.

Personally, I think it's a shame that JetBrains get such flack for collecting telemetry in their free products when Microsoft do the same in VS Code with hardly anyone voicing the same level of criticism for it.

voidspark · 2025-05-07T18:50:14 1746643814

> a shame that JetBrains get such flack for collecting telemetry in their free products

Probably 99.99% of developers don’t care.

The ones who complain about it online are a tiny vocal minority.

riquito · 2025-05-06T02:56:10 1746500170

Very cool, thanks for sharing.

A couple questions: - any thought about wake word engines, to have something that listen without consuming all the time? The landscape for open solutions doesn't seem good - any plan to allow using external services for stt/tts for the people who don't have a 4090 ready (at the cost of privacy and sass providers)?

TeMPOraL · 2025-05-06T08:05:09 1746518709

FWIW, wake words are a stopgap; if we want to have a Star Trek level voice interfaces, where the computer responds only when you actually meant to call it, as opposed to using the wake word as a normal word in the conversation, the computer needs to be constantly listening.

A good analogy here is to think of the computer (assistant) as another person in the room, busy with their own stuff but paying attention to the conversations happening around them, in case someone suddenly requests their assistance.

This, of course, could be handled by a more lightweight LLM running locally and listening for explicit mentions/addressing the computer/assistant, as opposed to some context-free wake words.

Dr4kn · 2025-05-06T08:38:19 1746520699

Home Assistant is much nearer to this than other solutions.

You have a wake word, but it can also speak to you based on automations. You come home and it could tell you that the milk is empty, but with a holiday coming up you probably should go shopping.

Dlemo · 2025-05-06T10:36:21 1746527781

I want that for privacy reasons and for resource reasons.

And having this as a small hardware device should not add relevant latency to it.

jillyboel · 2025-05-06T10:58:50 1746529130

Privacy isn't a concern when everything is local

Dlemo · 2025-05-06T19:51:42 1746561102

Yes it is.

Malware, bugs etc can happen.

And I also might not want to disable it for every guest either.

ben_w · 2025-05-06T20:50:42 1746564642

If the AI is local, it doesn't need to be on an internet connected device. At that point, malware and bugs in that stack don't add extra privacy risks* — but malware and bugs in all your other devices with microphones etc. remain a risk, even if the LLM is absolutely perfect by whatever standard that means for you.

* unless you put the AI on a robot body, but that's then your own new and exciting problem.

jillyboel · 2025-05-06T21:12:19 1746565939

There is no privacy difference between a local LLM listening versus a local wake word model listening.

koljab · 2025-05-06T10:40:18 1746528018

That would be quite easy to integrate. RealtimeSTT already has wakeword support for both pvporcupine and openwakewords.

justlikereddit · 2025-05-06T07:24:27 1746516267

Modify it with an ultra light LLM agent that always listens that uses a wake word to agentically call the paid API?

Dr4kn · 2025-05-06T08:36:06 1746520566

You could use open wake word. Which Home Assistant developed for its own Voice Assistant

supermatt · 2025-05-06T09:32:30 1746523950

It was developed by David Scripka: https://github.com/dscripka/openWakeWord

riquito · 2025-05-01T22:19:26 1746137966

> Waymo is limited to few specific locations with decent roads and does not drive in poor weather

the study is comparing Waymo to accidents occurred in the same cities where Waymo operates, and my understanding is that Waymo drives 7 days a week, 24h a day in those cities, so same roads, same weather. Seems a legit comparison

notTooFarGone · 2025-05-02T07:19:33 1746170373

Also there is some sort of bias not accounted for: People drive when most people drive and most people are stuck in the most dangerous area: traffic. Waymo driving at night on empty streets is not a good indicator for accident prevention when measured against the average human, who is stuck mostly in traffic.

ra7 · 2025-05-02T16:06:53 1746202013

Why do you believe Waymo's miles are from driving at night on empty streets? They drive when there's rideshare demand, a majority of which occurs during daytime and in the busiest areas of a city. They are no less stuck in traffic than the average human.

manquer · 2025-05-02T01:58:08 1746151088

Not the same cars though, people are not driving newish mid range SUV with professional maintenance all the time.

riquito · 2025-04-30T15:31:48 1746027108

Love it, it's brilliant, but I think the rate limiting logic is not doing what the author really wants, it actually costs more cpu to detect and produce the error than returning the regular response (then my mind goes on how to actually over optimize this thing, but that's another story :-D )

riquito · 2025-04-29T16:46:12 1745945172

They don't consume anything, at my peak I closed 2736 tabs (I have a photo to commemorate). Firefox somehow didn't care

malfist · 2025-04-30T04:29:38 1745987378

If you don't mind to share it, I'd love to see it. I want to prove to my coworkers I've not got a problem!