More

rebelde · 2024-02-22T21:17:08.000000Z

Google is paying Reddit instead of just taking it for free like they do from all other websites?

BadHumans · 2024-02-22T21:51:23.000000Z

Second line in the article: "In an update on Thursday, Reddit announced it will start providing Google “more efficient ways to train models.”

They will helping Google create a real-time pipeline to Reddit's data.

rebelde · 2024-01-12T00:31:37.000000Z

Body damage far from the battery for me was normal price. Anything close to the battery was outrageously expensive for me.

rebelde · 2024-01-10T18:29:59.000000Z

Microsoft, like the old Microsoft, seems to completely reject all these modern methods and use their own instead. So, you get a lot of spam and my legitimate emails are rejected.

rebelde · 2023-11-25T01:39:50.000000Z

Feature request for any service like this: Let "me" know if it is a school, so I know that I am probably dealing with minors, a public environment and a firewall. Of course, you need to do the work (rDNS to start) to identify the schools.

I love the speed of the responses!

reincoder · 2023-11-25T13:36:03.000000Z

I work for IPinfo. We have higher level IP category information on our website for free.

We categorize ASN and companies/organizations based on 4 categories: ISP, Education, Hosting and Business. This ASN level categorization are done mainly from WHOIS and other public internet records.

We don't sub-categorize by schools, university, public research institutions, k-12 etc. The reason is accuracy. Even though I can understand the possible methods for doing this, the issue is that it can not be done reliably at scale.

As a data provider, from our end we hope to provide the highest possible accuracy and vouch for the service we provide. For this level of classification, we will generally request users to say what data they need from us, and we try to help them come to a solution that they have to build on their own. They can do whatever classification they want to do based on their personal level of tolerance for accuracy.

slig · 2023-11-25T12:31:15.000000Z

IPINFO.io has that info, but I too would like to have this information in a free service.

inemesitaffia · 2023-11-26T14:08:17.000000Z

University's and post secondary educational institutions can have k12 schools attached especially if they study education or are isolated

avipars · 2023-11-26T11:13:31.000000Z

Seems like schools should opt-in by submitting their IP address/range

rebelde · on April 21, 2023

Is there a name for designers who know CSS? I guess not. It would make it easier for to search for and hire them if I could identify them better.

ckz · on April 21, 2023

In the past I've just called myself a full-stack designer (+/- extra words like UI/UX or developer) or some variant thereof. The hiring process isn't really aligned to finding such folks though for sure. Neither are internal progression systems at some companies.

My first internship was as a designer, then I freelanced for them as a dev, then when I graduated and it came time to join full-time, I was asked to pick one track or the other. :)

rebelde · on April 18, 2023

How do they plan to keep Google from using its search index of Reddit for training? Or keep OpenAI from using Common Crawl? Do they simply add "No AI" to their TOS?

rebelde · on March 14, 2023

Why write your thoughts on the web when AI/GPT is only going to steal and paraphrase it? Nobody sees what you write and everybody thinks GPT is the genius.

cableshaft · on March 14, 2023

Just saw something today where the wife of TotalBiscuit, who died of cancer several years ago, is contemplating deleting all of his Youtube videos[1] to prevent people from using A.I. to make him say terrible things.

Did give me a bit of a pause about putting stuff out there. Although I think I'd still rather have my data be used for training A.I. than not (and I probably am already in the training data anyway, I believe I saw that one of the datasets it's been trained on was Hacker News comments).

[1]: https://kotaku.com/totalbiscuit-john-bain-youtube-delete-vid...

Dalewyn · on March 15, 2023

Given that the "AI" community apparently couldn't care less about treating intellectual property rights with wanton abandon, I can't say such a response would be unwarranted.

Dire circumstances call for drastic measures, as they say.

jeroenhd · on March 15, 2023

Quite a sad, but completely understandable reaction. The saddest part is probably that it's already too late to prevent people from generating TB deepfakes and other content. Cloning a voice takes half an hour if clips now, any downloaded live stream should be enough already.

It's sad to see AI on a path to destroy years of collected internet content. I expect the internet archive to receive loads of takedown requests in the coming months and years because of this.

nodemaker · on March 14, 2023

I would like to make the opposite argument. All these days I didnt share my thoughts because everyone else was and my voice would be drowned in a sea of voices. In post GPT4 era its easier to stand out if your thoughts are actually original and refreshing because most people sound like their thoughts have been written by GPT.

To rephrase it another way, the reign of the conformist ends here and the reign of the contrarian begins now.

throwaway675309 · on March 15, 2023

A lovely sentiment in theory, but Waldo is still perniciously difficult to find even though he dresses differently from every other character.

nodemaker · on March 15, 2023

What if all characters other than waldo were just dressing the same because they were trying to ape each other to get fictitious points on social forums. Internet has trained an entire generation to make arguments to get validation on social media that definitely reflects in the ideas that are put forward.

EamonnMR · on March 15, 2023

Or just the reign of brevity. Sheer volume is no longer impressive.

nodemaker · on March 15, 2023

Great point. More volume in explaining the same thought is more GPT like.

lupire · on March 15, 2023

Your ideas are low probability autocomplete. GPT wants popular ideas, not novel ideas.

nodemaker · on March 15, 2023

I was trying to say that what most people say is mostly unoriginal and is very reminiscent of GPT style writing. What data GPT trains on or pays attention to is another question.

SketchySeaBeast · on March 14, 2023

That's why I keep my content as low quality as possible - keeps the machines humble.

ThrowawayTestr · on March 15, 2023

I'll just run it though an AI upscaler before I run it though the AI language model.

SketchySeaBeast · on March 15, 2023

We don't need an upscaler, we need an upclasser so all the ASCII Dickbutts drawn get little top hats and monocles put on them.

pklausler · on March 14, 2023

The general problem of "AI"s being trained on copyrighted content needs to be discussed more thoroughly, I think.

bluefirebrand · on March 14, 2023

Every time I bring this up, people accuse me of resisting progress, "the cats out of the bag", etc.

It has been frustrating.

km3r · on March 14, 2023

The cat is out of the bag, and I don't see any reason training should be any more controlled than me personally viewing something and 'training' my brain on it. Using either to duplicate copyrighted works is already clearly illegal.

angrais · on March 14, 2023

It is illegal for you to download copyrighted material and distribute it as your own. Models trained on such data can (and are statistically more likely) to produce similar output as their (training) input.

So training must consider licencing where copyright material is used and not consume all data.

Your brain is not a model. You can not reproduce most of what you see. You're not "training" your brain by glancing at an image as your recall concerning that image will be terrible.

meh8881 · on March 15, 2023

My brain can certainly recreate something it’s seen before. And it can certainly create something similar to a thing it’s seen before. It’s legal to do the latter and illegal to do the latter. Imperfections on the exact recreations don’t affect the legality of it.

Am I violating copyright law because I am merely capable of producing a copy of something? Obviously not. Why should the model be?

antibasilisk · on March 15, 2023

>It is illegal for you to download copyrighted material and distribute it as your own

I'm sure the millions of people who violate copyright law daily with absolutely no repercussions care very much about that.

ClumsyPilot · on March 15, 2023

Millions of people dont pay taxes and cross the road in the wrong place.

You cant setup a cinema and charge ticket for the movies you stole.

Its the money making side that matters - not individuals ij a private house

antibasilisk · on March 15, 2023

Ok, so then lets violate copyright and open source the effort!

Paradigma11 · on March 15, 2023

There will just be checks that make sure that the generated content is not similar enough to violate copyrights of training material and that's it.

GolfPopper · on March 15, 2023

For the same reason that the police being able to have a person look up in a physical printed file who owns a particular car via its license plate is not the same as having a network of cameras and computers that track every car in the city.

km3r · on March 15, 2023

Yeah I don't have any problem with that too. If a cop has a right to see me, he should be legally allow to record me (and in fact would prefer all cop interactions were recorded). A camera + AI allows for massive cost savings on basic police work, enabling police to be more efficient. A camera has a lot less bias than a cop.

lupire · on March 15, 2023

It's because you (and all of us) have a teeny human brain, and these are terrible at remembering things, so the teeny little bits you can remember are protected under Fair Use.

anonzzzies · on March 15, 2023

I think it’s not very hard; if the AI companies believe the data they trained on is public domain/open because they scraped it of the internet, then their trained weights must publicly available as well. They cannot claim ‘but training is expensive’; if they do, then they should pay fees for the hosting and storage and writing time of all data they scraped. I prefer open weights as it’s more practical. Your weights have a sliver of GPL source in it? Well that infected the entire thing as GPL does: it is ours now too!

noogle · on March 14, 2023

The current (legal) answer is "unclear". There are indications that training is fine, but producing and using the generated content is questionable at least. As many IP issues, it will solved only when someone will try that in court and go all the way until a verdict. Some cases are actually being processed but it might take years to get an answer.

sampo · on March 14, 2023

> The general problem of "AI"s being trained on copyrighted content

> The current (legal) answer is "unclear".

European Union was ahead of times for once. The 2019 copyright directive, article 4, makes it legal to scrape the web and make and keep local copies of copyrighted works, for data mining purposes. Unless the copyright holders set up a machine readable exception (such as robots.txt file).

So legal in EU, "unclear" in US.

pklausler · on March 14, 2023

That does not, to me, automatically imply that an "AI" lawfully regurgitating copyrighted content is a "data mining purpose".

News-Dog · on March 15, 2023

Consider that an AI may cite many snippets of copyright publications into a chimera of 'Facts'.

'copyright fair use' : https://copyrightalliance.org/faqs/what-is-fair-use/

EamonnMR · on March 15, 2023

Does OpenAI respect Robots.txt? Do we know?

antibasilisk · on March 14, 2023

Copyright's been dead since the internet was born. I really do think it's the least of our problems when it comes to abstract reasoning engines.

Swizec · on March 14, 2023

Becoming part of the cultural lexicon is the ultimate goal of thought leadership.

Just look at how many people say stuff like “Two women can’t make a baby in 4.5 months”. Someone (Brooks) had to invent, write down, and popularize that analogy.

raincole · on March 14, 2023

Why write your thoughts on the web when other humans are going to steal and paraphrase it? I mean... you're on HN. Don't tell me you didn't notice people often regurgitate tech influencers like Paul Graham and Joel Spolsky's thoughts.

Mordisquitos · on March 14, 2023

Anonymous people regurgitate the thoughts of well-known individuals such as Paul Graham and Joel Spolsky. The fact that their thoughts are regurgitated is a testament to how well known they are already and how much their content is read by other people. Nobody is going to steal their limelight only on the basis of paraphrasing their ideas. However, if someone does write original ideas of their own, they may gain some notoriety for themselves.

Now imagine that Paul Graham and Joel Spolsky were able to read everything being written by every anonymous unknown on the internet, and create content paraphrasing any and every original thought that was created by anonymous individuals at will. How do the original creators of these thoughts have any chance to succeed on their own merit, if Paul Graham and Joel Spolsky (who everyone knows already as sources of ideas) are able to write the same stuff as soon as the anonymous person has made it public?

meh8881 · on March 15, 2023

If Paul Graham is expressing every conceivable thought then he’s not a very interesting person to read because he has no perspective on anything.

But if a model starts generating better content than Paul Graham in a nice curated form, then yeah, Paul Graham ought to find a better way to spend his time because he is not adding value.

slg · on March 14, 2023

Imagine a friend asks for help in a class. You can either spend some time and try to teach them the subject or let them copy off you during the exam. The former generally feels good despite taking more effort. The latter often feels bad even if it doesn't impact you negatively in any way and helps your classmate more than if you did nothing.

The human to human connection that a blog or social media conversation creates feels a lot more like teaching your classmate while the AI feels a lot more like someone cheating off your work. Plus the AI didn't even bother to get your approval before copying from you. The whole thing feels ethically compromised regardless of the ultimate result.

JohnFen · on March 14, 2023

This was the place I reached. I'm not concerned about "stealing", exactly, but I don't want to contribute to this technology.

I think my days of sharing things freely on the web are over.

whywhywhywhy · on March 15, 2023

So maybe only post dumb and incorrect information.

Train it to be wrong on purpose, for a joke.

olalonde · on March 14, 2023

Because you can get points on Hacker News.

rebelde · on Oct 21, 2022

I wish Motorcycles would emit some "noise" in the radio spectrum that says "Motorcycle over here!". My car gets the signal and does ... something with it. (Kids' shoes, too.) Not a perfect solution, but better than what we have now.

_carbyau_ · on Oct 21, 2022

Different take! I like the idea but have concerns over adversarial abuse - mainly because you've been vague over what it does.

But I guess a beeping noise in the car stereo to indicate direction of the [thing] would be ok.

rebelde · on Oct 15, 2022

This site is great and is one of my favorites. I occasionally check it and set an alarm on my phone. I will announce to the people that I am with "satellites passing in 3 minutes", run outside and impress people. Great fun. Thank you, thank you, thank you!

rebelde · on March 4, 2022

Eventbrite might be the new Meetup. I was searching Meetup for events of a certain type and got nothing interesting. Eventbrite had quite a few events.

Anyone successful using eventbrite to find groups?