Microsoft, like the old Microsoft, seems to completely reject all these modern methods and use their own instead. So, you get a lot of spam and my legitimate emails are rejected.
Feature request for any service like this: Let "me" know if it is a school, so I know that I am probably dealing with minors, a public environment and a firewall. Of course, you need to do the work (rDNS to start) to identify the schools.
I work for IPinfo. We have higher level IP category information on our website for free.
We categorize ASN and companies/organizations based on 4 categories: ISP, Education, Hosting and Business. This ASN level categorization are done mainly from WHOIS and other public internet records.
We don't sub-categorize by schools, university, public research institutions, k-12 etc. The reason is accuracy. Even though I can understand the possible methods for doing this, the issue is that it can not be done reliably at scale.
As a data provider, from our end we hope to provide the highest possible accuracy and vouch for the service we provide. For this level of classification, we will generally request users to say what data they need from us, and we try to help them come to a solution that they have to build on their own. They can do whatever classification they want to do based on their personal level of tolerance for accuracy.
In the past I've just called myself a full-stack designer (+/- extra words like UI/UX or developer) or some variant thereof. The hiring process isn't really aligned to finding such folks though for sure. Neither are internal progression systems at some companies.
My first internship was as a designer, then I freelanced for them as a dev, then when I graduated and it came time to join full-time, I was asked to pick one track or the other. :)
How do they plan to keep Google from using its search index of Reddit for training? Or keep OpenAI from using Common Crawl? Do they simply add "No AI" to their TOS?
Why write your thoughts on the web when AI/GPT is only going to steal and paraphrase it? Nobody sees what you write and everybody thinks GPT is the genius.
Just saw something today where the wife of TotalBiscuit, who died of cancer several years ago, is contemplating deleting all of his Youtube videos[1] to prevent people from using A.I. to make him say terrible things.
Did give me a bit of a pause about putting stuff out there. Although I think I'd still rather have my data be used for training A.I. than not (and I probably am already in the training data anyway, I believe I saw that one of the datasets it's been trained on was Hacker News comments).
Given that the "AI" community apparently couldn't care less about treating intellectual property rights with wanton abandon, I can't say such a response would be unwarranted.
Dire circumstances call for drastic measures, as they say.
Quite a sad, but completely understandable reaction. The saddest part is probably that it's already too late to prevent people from generating TB deepfakes and other content. Cloning a voice takes half an hour if clips now, any downloaded live stream should be enough already.
It's sad to see AI on a path to destroy years of collected internet content. I expect the internet archive to receive loads of takedown requests in the coming months and years because of this.
I would like to make the opposite argument. All these days I didnt share my thoughts because everyone else was and my voice would be drowned in a sea of voices. In post GPT4 era its easier to stand out if your thoughts are actually original and refreshing because most people sound like their thoughts have been written by GPT.
To rephrase it another way, the reign of the conformist ends here and the reign of the contrarian begins now.
What if all characters other than waldo were just dressing the same because they were trying to ape each other to get fictitious points on social forums. Internet has trained an entire generation to make arguments to get validation on social media that definitely reflects in the ideas that are put forward.
I was trying to say that what most people say is mostly unoriginal and is very reminiscent of GPT style writing. What data GPT trains on or pays attention to is another question.
The cat is out of the bag, and I don't see any reason training should be any more controlled than me personally viewing something and 'training' my brain on it. Using either to duplicate copyrighted works is already clearly illegal.
It is illegal for you to download copyrighted material and distribute it as your own. Models trained on such data can (and are statistically more likely) to produce similar output as their (training) input.
So training must consider licencing where copyright material is used and not consume all data.
Your brain is not a model. You can not reproduce most of what you see. You're not "training" your brain by glancing at an image as your recall concerning that image will be terrible.
My brain can certainly recreate something it’s seen before. And it can certainly create something similar to a thing it’s seen before. It’s legal to do the latter and illegal to do the latter. Imperfections on the exact recreations don’t affect the legality of it.
Am I violating copyright law because I am merely capable of producing a copy of something? Obviously not. Why should the model be?
For the same reason that the police being able to have a person look up in a physical printed file who owns a particular car via its license plate is not the same as having a network of cameras and computers that track every car in the city.
Yeah I don't have any problem with that too. If a cop has a right to see me, he should be legally allow to record me (and in fact would prefer all cop interactions were recorded). A camera + AI allows for massive cost savings on basic police work, enabling police to be more efficient. A camera has a lot less bias than a cop.
It's because you (and all of us) have a teeny human brain, and these are terrible at remembering things, so the teeny little bits you can remember are protected under Fair Use.
I think it’s not very hard; if the AI companies believe the data they trained on is public domain/open because they scraped it of the internet, then their trained weights must publicly available as well. They cannot claim ‘but training is expensive’; if they do, then they should pay fees for the hosting and storage and writing time of all data they scraped. I prefer open weights as it’s more practical. Your weights have a sliver of GPL source in it? Well that infected the entire thing as GPL does: it is ours now too!
The current (legal) answer is "unclear". There are indications that training is fine, but producing and using the generated content is questionable at least. As many IP issues, it will solved only when someone will try that in court and go all the way until a verdict. Some cases are actually being processed but it might take years to get an answer.
> The general problem of "AI"s being trained on copyrighted content
> The current (legal) answer is "unclear".
European Union was ahead of times for once. The 2019 copyright directive, article 4, makes it legal to scrape the web and make and keep local copies of copyrighted works, for data mining purposes. Unless the copyright holders set up a machine readable exception (such as robots.txt file).
Becoming part of the cultural lexicon is the ultimate goal of thought leadership.
Just look at how many people say stuff like “Two women can’t make a baby in 4.5 months”. Someone (Brooks) had to invent, write down, and popularize that analogy.
Why write your thoughts on the web when other humans are going to steal and paraphrase it? I mean... you're on HN. Don't tell me you didn't notice people often regurgitate tech influencers like Paul Graham and Joel Spolsky's thoughts.
Anonymous people regurgitate the thoughts of well-known individuals such as Paul Graham and Joel Spolsky. The fact that their thoughts are regurgitated is a testament to how well known they are already and how much their content is read by other people. Nobody is going to steal their limelight only on the basis of paraphrasing their ideas. However, if someone does write original ideas of their own, they may gain some notoriety for themselves.
Now imagine that Paul Graham and Joel Spolsky were able to read everything being written by every anonymous unknown on the internet, and create content paraphrasing any and every original thought that was created by anonymous individuals at will. How do the original creators of these thoughts have any chance to succeed on their own merit, if Paul Graham and Joel Spolsky (who everyone knows already as sources of ideas) are able to write the same stuff as soon as the anonymous person has made it public?
If Paul Graham is expressing every conceivable thought then he’s not a very interesting person to read because he has no perspective on anything.
But if a model starts generating better content than Paul Graham in a nice curated form, then yeah, Paul Graham ought to find a better way to spend his time because he is not adding value.
Imagine a friend asks for help in a class. You can either spend some time and try to teach them the subject or let them copy off you during the exam. The former generally feels good despite taking more effort. The latter often feels bad even if it doesn't impact you negatively in any way and helps your classmate more than if you did nothing.
The human to human connection that a blog or social media conversation creates feels a lot more like teaching your classmate while the AI feels a lot more like someone cheating off your work. Plus the AI didn't even bother to get your approval before copying from you. The whole thing feels ethically compromised regardless of the ultimate result.
I wish Motorcycles would emit some "noise" in the radio spectrum that says "Motorcycle over here!". My car gets the signal and does ... something with it. (Kids' shoes, too.) Not a perfect solution, but better than what we have now.
This site is great and is one of my favorites. I occasionally check it and set an alarm on my phone. I will announce to the people that I am with "satellites passing in 3 minutes", run outside and impress people. Great fun. Thank you, thank you, thank you!
Eventbrite might be the new Meetup. I was searching Meetup for events of a certain type and got nothing interesting. Eventbrite had quite a few events.
Anyone successful using eventbrite to find groups?