Hacker News new | past | comments | ask | show | jobs | submit | rebelde's comments login

Google is paying Reddit instead of just taking it for free like they do from all other websites?


Second line in the article: "In an update on Thursday, Reddit announced it will start providing Google “more efficient ways to train models.”

They will helping Google create a real-time pipeline to Reddit's data.


Body damage far from the battery for me was normal price. Anything close to the battery was outrageously expensive for me.


Microsoft, like the old Microsoft, seems to completely reject all these modern methods and use their own instead. So, you get a lot of spam and my legitimate emails are rejected.


Feature request for any service like this: Let "me" know if it is a school, so I know that I am probably dealing with minors, a public environment and a firewall. Of course, you need to do the work (rDNS to start) to identify the schools.

I love the speed of the responses!


I work for IPinfo. We have higher level IP category information on our website for free.

We categorize ASN and companies/organizations based on 4 categories: ISP, Education, Hosting and Business. This ASN level categorization are done mainly from WHOIS and other public internet records.

We don't sub-categorize by schools, university, public research institutions, k-12 etc. The reason is accuracy. Even though I can understand the possible methods for doing this, the issue is that it can not be done reliably at scale.

As a data provider, from our end we hope to provide the highest possible accuracy and vouch for the service we provide. For this level of classification, we will generally request users to say what data they need from us, and we try to help them come to a solution that they have to build on their own. They can do whatever classification they want to do based on their personal level of tolerance for accuracy.


IPINFO.io has that info, but I too would like to have this information in a free service.


University's and post secondary educational institutions can have k12 schools attached especially if they study education or are isolated


Seems like schools should opt-in by submitting their IP address/range


Is there a name for designers who know CSS? I guess not. It would make it easier for to search for and hire them if I could identify them better.


In the past I've just called myself a full-stack designer (+/- extra words like UI/UX or developer) or some variant thereof. The hiring process isn't really aligned to finding such folks though for sure. Neither are internal progression systems at some companies.

My first internship was as a designer, then I freelanced for them as a dev, then when I graduated and it came time to join full-time, I was asked to pick one track or the other. :)


How do they plan to keep Google from using its search index of Reddit for training? Or keep OpenAI from using Common Crawl? Do they simply add "No AI" to their TOS?


Why write your thoughts on the web when AI/GPT is only going to steal and paraphrase it? Nobody sees what you write and everybody thinks GPT is the genius.


Just saw something today where the wife of TotalBiscuit, who died of cancer several years ago, is contemplating deleting all of his Youtube videos[1] to prevent people from using A.I. to make him say terrible things.

Did give me a bit of a pause about putting stuff out there. Although I think I'd still rather have my data be used for training A.I. than not (and I probably am already in the training data anyway, I believe I saw that one of the datasets it's been trained on was Hacker News comments).

[1]: https://kotaku.com/totalbiscuit-john-bain-youtube-delete-vid...


Given that the "AI" community apparently couldn't care less about treating intellectual property rights with wanton abandon, I can't say such a response would be unwarranted.

Dire circumstances call for drastic measures, as they say.


Quite a sad, but completely understandable reaction. The saddest part is probably that it's already too late to prevent people from generating TB deepfakes and other content. Cloning a voice takes half an hour if clips now, any downloaded live stream should be enough already.

It's sad to see AI on a path to destroy years of collected internet content. I expect the internet archive to receive loads of takedown requests in the coming months and years because of this.


I would like to make the opposite argument. All these days I didnt share my thoughts because everyone else was and my voice would be drowned in a sea of voices. In post GPT4 era its easier to stand out if your thoughts are actually original and refreshing because most people sound like their thoughts have been written by GPT.

To rephrase it another way, the reign of the conformist ends here and the reign of the contrarian begins now.


A lovely sentiment in theory, but Waldo is still perniciously difficult to find even though he dresses differently from every other character.


What if all characters other than waldo were just dressing the same because they were trying to ape each other to get fictitious points on social forums. Internet has trained an entire generation to make arguments to get validation on social media that definitely reflects in the ideas that are put forward.


Or just the reign of brevity. Sheer volume is no longer impressive.


Great point. More volume in explaining the same thought is more GPT like.


Your ideas are low probability autocomplete. GPT wants popular ideas, not novel ideas.


I was trying to say that what most people say is mostly unoriginal and is very reminiscent of GPT style writing. What data GPT trains on or pays attention to is another question.


That's why I keep my content as low quality as possible - keeps the machines humble.


I'll just run it though an AI upscaler before I run it though the AI language model.


We don't need an upscaler, we need an upclasser so all the ASCII Dickbutts drawn get little top hats and monocles put on them.


The general problem of "AI"s being trained on copyrighted content needs to be discussed more thoroughly, I think.


Every time I bring this up, people accuse me of resisting progress, "the cats out of the bag", etc.

It has been frustrating.


The cat is out of the bag, and I don't see any reason training should be any more controlled than me personally viewing something and 'training' my brain on it. Using either to duplicate copyrighted works is already clearly illegal.


It is illegal for you to download copyrighted material and distribute it as your own. Models trained on such data can (and are statistically more likely) to produce similar output as their (training) input.

So training must consider licencing where copyright material is used and not consume all data.

Your brain is not a model. You can not reproduce most of what you see. You're not "training" your brain by glancing at an image as your recall concerning that image will be terrible.


My brain can certainly recreate something it’s seen before. And it can certainly create something similar to a thing it’s seen before. It’s legal to do the latter and illegal to do the latter. Imperfections on the exact recreations don’t affect the legality of it.

Am I violating copyright law because I am merely capable of producing a copy of something? Obviously not. Why should the model be?


>It is illegal for you to download copyrighted material and distribute it as your own

I'm sure the millions of people who violate copyright law daily with absolutely no repercussions care very much about that.


Millions of people dont pay taxes and cross the road in the wrong place.

You cant setup a cinema and charge ticket for the movies you stole.

Its the money making side that matters - not individuals ij a private house


Ok, so then lets violate copyright and open source the effort!


There will just be checks that make sure that the generated content is not similar enough to violate copyrights of training material and that's it.


For the same reason that the police being able to have a person look up in a physical printed file who owns a particular car via its license plate is not the same as having a network of cameras and computers that track every car in the city.


Yeah I don't have any problem with that too. If a cop has a right to see me, he should be legally allow to record me (and in fact would prefer all cop interactions were recorded). A camera + AI allows for massive cost savings on basic police work, enabling police to be more efficient. A camera has a lot less bias than a cop.


It's because you (and all of us) have a teeny human brain, and these are terrible at remembering things, so the teeny little bits you can remember are protected under Fair Use.


I think it’s not very hard; if the AI companies believe the data they trained on is public domain/open because they scraped it of the internet, then their trained weights must publicly available as well. They cannot claim ‘but training is expensive’; if they do, then they should pay fees for the hosting and storage and writing time of all data they scraped. I prefer open weights as it’s more practical. Your weights have a sliver of GPL source in it? Well that infected the entire thing as GPL does: it is ours now too!


The current (legal) answer is "unclear". There are indications that training is fine, but producing and using the generated content is questionable at least. As many IP issues, it will solved only when someone will try that in court and go all the way until a verdict. Some cases are actually being processed but it might take years to get an answer.


> The general problem of "AI"s being trained on copyrighted content

> The current (legal) answer is "unclear".

European Union was ahead of times for once. The 2019 copyright directive, article 4, makes it legal to scrape the web and make and keep local copies of copyrighted works, for data mining purposes. Unless the copyright holders set up a machine readable exception (such as robots.txt file).

So legal in EU, "unclear" in US.


That does not, to me, automatically imply that an "AI" lawfully regurgitating copyrighted content is a "data mining purpose".


Consider that an AI may cite many snippets of copyright publications into a chimera of 'Facts'.

'copyright fair use' : https://copyrightalliance.org/faqs/what-is-fair-use/


Does OpenAI respect Robots.txt? Do we know?


Copyright's been dead since the internet was born. I really do think it's the least of our problems when it comes to abstract reasoning engines.


Becoming part of the cultural lexicon is the ultimate goal of thought leadership.

Just look at how many people say stuff like “Two women can’t make a baby in 4.5 months”. Someone (Brooks) had to invent, write down, and popularize that analogy.


Why write your thoughts on the web when other humans are going to steal and paraphrase it? I mean... you're on HN. Don't tell me you didn't notice people often regurgitate tech influencers like Paul Graham and Joel Spolsky's thoughts.


Anonymous people regurgitate the thoughts of well-known individuals such as Paul Graham and Joel Spolsky. The fact that their thoughts are regurgitated is a testament to how well known they are already and how much their content is read by other people. Nobody is going to steal their limelight only on the basis of paraphrasing their ideas. However, if someone does write original ideas of their own, they may gain some notoriety for themselves.

Now imagine that Paul Graham and Joel Spolsky were able to read everything being written by every anonymous unknown on the internet, and create content paraphrasing any and every original thought that was created by anonymous individuals at will. How do the original creators of these thoughts have any chance to succeed on their own merit, if Paul Graham and Joel Spolsky (who everyone knows already as sources of ideas) are able to write the same stuff as soon as the anonymous person has made it public?


If Paul Graham is expressing every conceivable thought then he’s not a very interesting person to read because he has no perspective on anything.

But if a model starts generating better content than Paul Graham in a nice curated form, then yeah, Paul Graham ought to find a better way to spend his time because he is not adding value.


Imagine a friend asks for help in a class. You can either spend some time and try to teach them the subject or let them copy off you during the exam. The former generally feels good despite taking more effort. The latter often feels bad even if it doesn't impact you negatively in any way and helps your classmate more than if you did nothing.

The human to human connection that a blog or social media conversation creates feels a lot more like teaching your classmate while the AI feels a lot more like someone cheating off your work. Plus the AI didn't even bother to get your approval before copying from you. The whole thing feels ethically compromised regardless of the ultimate result.


This was the place I reached. I'm not concerned about "stealing", exactly, but I don't want to contribute to this technology.

I think my days of sharing things freely on the web are over.


So maybe only post dumb and incorrect information.

Train it to be wrong on purpose, for a joke.


Because you can get points on Hacker News.


I wish Motorcycles would emit some "noise" in the radio spectrum that says "Motorcycle over here!". My car gets the signal and does ... something with it. (Kids' shoes, too.) Not a perfect solution, but better than what we have now.


Different take! I like the idea but have concerns over adversarial abuse - mainly because you've been vague over what it does.

But I guess a beeping noise in the car stereo to indicate direction of the [thing] would be ok.


This site is great and is one of my favorites. I occasionally check it and set an alarm on my phone. I will announce to the people that I am with "satellites passing in 3 minutes", run outside and impress people. Great fun. Thank you, thank you, thank you!


Eventbrite might be the new Meetup. I was searching Meetup for events of a certain type and got nothing interesting. Eventbrite had quite a few events.

Anyone successful using eventbrite to find groups?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: