Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>after introducing proxies my crawl times grew by an order of magnitude from minutes to hours

Yeah, same experience. Right now I use luminati.io datacenter IPs that work ok, anyone know of a cheaper option that works well? Scraping tens of millions of pages a month.



I suppose the economics of it comes in to play in a similar vein to mailchimp, the lower the pricing the more scammier clients and the more IPs they lose to blacklists.


Not really. Mail is default bad, you need to build up trust just to get a tiny amount of deliverability. Fetching webpages is default good until you're detected as bad.

The thing is, a lot of scraping goes unnoticed. Maybe you get an extra thousand hits here and there. But every spam campaign gets noticed and results in some percentage of spam complaints from users.


You could use https://oxylabs.io/ or buy some regular VPN accounts and build a http proxy wrapper around that. Extremely cheap and works well. There’s a lot of existing projects on Github for that too.


oxylabs.io seems more expensive than luminati, minimum of $178/month. Not clear if they charge for bandwidth.

Problem with VPN is it's shared and hard to get a lot of IPs, any specific ones I could get say 100 dedicated US IPs for a reasonable price?


Depends on your scale and what you negotiate. The shared nature of VPN is usually not a huge problem but depends on your use case. Most VPN providers have a better deal for bigger customers so you can just buy multiple accounts in bulk with each having for example 5 connections and use that. For US specifically it often happens that they have 100s of servers and vpn configs so you can build something out of that.


Depending on what you scrape we can help. we have proxies in 170+ countries, please get in touch at www.speedchecker.xyz , the pricing is cheaper than what you say above




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: