Hacker Newsnew | past | comments | ask | show | jobs | submit | nikolayasdf123's commentslogin

> scraping LinkedIn profiles

is this legal? last time I checked linkedin.com/robots.txt do not allow scraping, unless explicit approval from linkedin


If it is publicly available information it is legal to scrape it, regardless of what robots.txt says.

See: https://www.webspidermount.com/is-web-scraping-legal-yes/


As an attorney (and this is not legal advice), I don't think it's quite that simple. The court held that the CFAA does not proscribe scraping of pages to which the user already has access and in a way that doesn't harm the service, and thus it's not a crime. But there are other mechanisms that might impact a scraper, such as civil liability, that have not been addressed uniformly by the courts yet. And if you scrape in such a way that does harm the operator (e.g. by denying service), it might still be unlawful, even criminal.

There's a relevant footnote in the cited HiQ Labs v. LinkedIn case:

"LinkedIn’s cease-and-desist letter also asserted a state common law claim of trespass to chattels. Although we do not decide the question, it may be that web scraping exceeding the scope of the website owner’s consent gives rise to a common law tort claim for trespass to chattels, at least when it causes demonstrable harm."

They also said: "Internet companies and the public do have a substantial interest in thwarting denial-of-service attacks and blocking abusive users, identity thieves, and other ill-intentioned actors."

It's a good idea to take legal conclusions from media sites with a grain of salt. Same goes for any legal discussion on social media, including HN. If you want a thorough analysis of legal risk--either for your business or for personal matters--hire a good lawyer.


Smart


Or run your legal questions through a frontier model and then have a lawyer verify the answers. You can save a lot of money and time.

Yes, all LLM caveats apply. Due your diligence. But they are quite good at this now.


Have you actually tried this approach? I’m curious as to the result, especially when you took it to your lawyer. Not a contract review but a business practice risk evaluation.


Some context from coverage of GPT 5:

https://legaltechnology.com/2025/08/08/openai-launches-gpt-5...

https://www.artificiallawyer.com/2025/08/08/gpt-5-tops-harve...

Remember when "asking for a friend" was a thing?

Today's expression is "I asked a friend". You can try that when talking to your lawyer about your latest ChatGPT — they might still believe you.


Hmm this is a good idea too


what a nonsense. they explicitly say "do not scrape us, unless we approve". they put paywalls and captchas. their service is literally selling access to users data.

now you scraping it. this is direct violation and direct harm to their business, despite their explicit statements for you to stop.

you loose the case, it is clear as day.


what a nonsense. this is equivalent of "sovereign citizens" online. go and try it, and get yourself into jail.


Do not confuse strong language with strong argument. Yours is the former not the latter.


LinkedIn has api. So why to scrap?


because they are pulling what they are not supposed to. they are doing it illegally. that's why.


> they are doing it illegally.

ToS aren't real laws, mate.

Edit: oops, just saw a message from the creator of this thing saying he gets the data in the most illegal possible ways. They have no salvation.

It is possible to do what they propose legally tho the "agent" is just the users computer.


ToS are leagally binding contracts. there are there for a reason.

contracts are not laws themselves. but correctly done ToS (I bet LinkedIn does) hold very real legal power.


We get our data from third party data vendors who we assume have gotten explicit approval from linkedin!


You assume! Such due diligence!


Unfortunately not able to get into their codebase


Or yours...


What would you like to see?

Can tell you :)


you're building a tool that is designed to sink its tentacles into peoples' most personal accounts and take unsupervised automated actions with them, using a technology that has serious, well known, documented security issues. you haven't demonstrated any experience with, awareness of, or consideration for the security issues at hand, so the ideal amount of code to share would likely be all of it.


Fair enough makes sense to not have trust!

We like to believe we're pretty trustworthy, and do our best to make everything secure.


i actually really like your product for what it's worth. don't listen to the haters. hackers build things.

i just won't use it, and nobody should, unless they can understand exactly how it works and reason for themselves about the risks they are taking. you clearly work hard and care deeply about what you are building, and it will be very useful. but it has the potential to cause widespread harm, no matter how trustworthy you are, how much you care about it, or what your intentions are.

with respect to user security and privacy, doing your best is not much better than yolo security. the minimum standard should be to research the threat landscape, study the state of the art in methods to mitigate those threats, implement them, and test them thoroughly, yourselves and through vendors. iterate through that process continuously, alongside your development. it will never end. or, you can open source it and the internet does this for you for free. build something people love, grow traction, convert that to money. THEN figure out how to make money from them.. not the other way around. or, more likely, some combination of all of the above.

someone else linked you to simon wilson's lethal trifecta page, i would absolutely start there, and read everything linked as well. pangea and spectreops both do good work in the llm pentesting space, i'm sure there are more.


> we build own MCP

> we use existing models via their API

> we use existing tools/services/platforms

> ChatGPT/OpenWebUI-like web interface

> mostly uses text, no image, no desktop control (?)

hardly can see what this app brings. also, it is paid and requests are routed to someone else? shouldn't this be free, local, and with bring-your-own–key already with things like ollama/llama.cpp?


We actually don't use MCP!

We just make our own tools in-house :)

Hmm the local open source model is something we've thought of, but currently haven't found open source models to be usable


Why __don't__ you use MCP?


Find that the quality of them currently aren't there yet for a general system. They tend to be designed just to use that singular app instead of to be used in parallel with other apps.


But you are compatible with MCP, right? Otherwise users are going to miss out on the MCP ecosystem. And you are going to be spending all your time developing your own versions of MCP plugins. Wouldn't it be easier to improve the existing ones?


It's a bit more complicated. We have a full custom single agent architecture, sort of like Manus that isn't fully compatible with MCP


MCP is what you use to make tools you own compatible with agents (like Claude Code) that you don't --- or vice/versa. It's not doing anything useful in the scenario where you own both the tool calling code and the agent.


The question is whether the tools are limited to what they offer.


Are you sure they want to provide access to arbitrary random tools other people wrote? It's easy enough to add MCP support to native tool calls, but I don't know that that's a great idea given their problem domain.


that just sounds like you have no idea what MCP is, I don't even like MCPs but I can't even understand what angle you are coming from unless they specifically mean using external MCPs instead of your own, since it is you know open source...


> We just make our own tools in-house :)

And that's somehow a good thing?


It's useful for quality.

For example we can read and attach pdfs to gmail which not a lot of people can, since we have our own internal storage api.


oh so now we are flagging people that think not having MCP support is bad?


HN is out of control.


[flagged]


> Out of all the AI slopups I've seen, y'all might the worst. Have fun, clowns.

You can't attack others like this here. We've banned the account.

If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.


We don't use open source models as of right now!


yes, it is pretty much tax at this point. they are not going for small websites nor giants, both are privacty nightmare. they are going against SHEIN because they have money. disgraceful


> non 100% correctness of kernels

wouldn't model not work properly if kernels are even slightly off?

wasn't kernels a part of training stack for models? am I missing anything?


I believe their speedup is computed _assuming they can easily fix the correctness bugs in the kernels_.

In practice, with slight differences the model will feel almost lobotomized.


The article is referring to GPU compute kernel (https://en.wikipedia.org/wiki/Compute_kernel), not the term kernel used in ML/NN/etc.


…aren't they the same thing


They're not, but I also misunderstood the original question, they're referring to the correct definition of kernel. I thought they were confusing the GPU kernel with https://en.wikipedia.org/wiki/Kernel_method or https://en.wikipedia.org/wiki/Kernel_(image_processing)


> habr.com

interesting to see this forum show-up again.

remember 15 years ago there were posts about DYI drone from some random guy with lots of theoretical physics about stable conditions derivations. it got a lot of criticism. now looking back and following what DJI is doing with sensors, his approach was totally wrong and that community nailed it with feedback. the forum got some extravagant ideas and some worthy criticism. at least back then.


I remember visiting this site daily 10-15 years ago, in russian, ofc. The moderation was super high, karma system worked great, the content quality was astonishing. Then they switched up owners, tried heavily monetizing corpo-pseudo-blogpost-marketing crap and it all went downhill from there


habr is an institution. it's like the "runet hn", minus wild west vc ecosystem, plus integrated blog posting like lj ogs intended to. probably helps a lot with original work like TFA getting traction. more power to that!

runet sites of that era are often born out of the hacker's characteristic contrarian attitude "because we can". attempts to monetize them in more recent years are bound to accomplish little more than fuck up the content quality and/or the "owner cashes out and opens cafe" thing.

nevertheless, to this day, when i think habrahabr, i think way higher bar for technical competence than hn. it's all in the attitude.


What are the modern equivalents of habr?


There's probably none. The Russian Internet has been Eternal Septembered too much for something similar to appear.


if i knew any, i sure as fuck wouldn't post them on hn of all places.


It went downhill when they allowed getting an invitation via single blog post, requiring just one person to like it enough to give an invitation. Which wasn't hard to write - just translate something popular from hackernews before anyone else does it.

Shortly after, it became hilariously easy to farm and manipulate karma balances across the entire site. With 50 accounts (mults or real people all the same) you could create a new account a day.

Monetization started when it was already in a death spiral.


don't forget the awful redesign, including completely replacing post formatter

all accumulated mastery of creating posts by experienced authors - gone overnight


It was also notoriously politics-free, until something happened.


holdon, I own domain (with say Let's Encrypt certs), I have my own keys for signing WebBotAuth tokens, I host public cert at my domain...

where does CloudFlare come as a gatekeeper? what do they have to do with me sining my requests and my tokens? am I missing something?


Nothing stops you from signing your own tokens, but if you want those tokens to actually help you get past CFs WAF then you have to convince (or pay) them to trust you. It's kind of like how you can sign your own public TLS certs, but they won't do you much good if the browser vendors don't trust them.


get visa in countries that allow remote work. Portugal, Thailand, China now issue good vias that let you work for a company anywhere in the world and stay many years in the country, or start your own business.

don't let your employer keep you and your family hostage with your legal status


> H1B unknowns

this. most countries have similar policies. been there, seen so many others going through this in UK, USA, Japan, Korea, Singapore. it really damages your life.

to everyone, check other countries (Portugal, Thailand, Japan) that give you residency for years and allow to work remotely

don't let your employer hold you and your family a hostage with your legal status


>Japan

Unfortunately you can only work here up to 6 months as a digital nomad and it's not residency, it's basically just an extended tourist visa.

Portugal's digital nomad visa seems good but their immigration system is apparently extremely dysfunctional.


Japan last year had HSP visa, which was a good deal (PR in 1 year + no Japanese language requirement). fastest permanent immigration track I ever seen


Hong Kong has good one right now with TTPS

China will be opening new visa track for tech in October, must be interseting to see


That would be nice, but if you want to immigrate, that’s the current deal.


C


Go


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: