Hacker News new | past | comments | ask | show | jobs | submit login

> Most scripts are either headless chrome instances using phantom is or selenium.

I would think that using regular HTTP requests is better/more popular then running a headless browser for this kind of work. cUrl, Python Requests, etc.




That is more desirable, but with more js rendering on the front end and anti-botting measures by retailers, it's not reliable anymore.


JS rendering on the front end surely doesn't impact calls to the back-end that much (HTTP POST, REST API, GraphQL etc.)?

These don't need a headless browser.


When a store is selling 20 pairs of a shoe, they will literally look at how the person who bough the pair did their web execution. Using a something like python requests is faster sure, but most of the time looking more "real" is more important. I've written a few of these and do both depending on the company (size etc.), their website, their security etc. Some of these companies have gone as far as setting up a web game people play and the highest scores get a pair of the shoe (slamjamsocialism).


Can you go into more detail about this game? First I've heard of it, and it clearly isn't a popular choice as opposed to say, a splash screen or other methods of deterring bots.


Yeah. At first is was pong. People figure out how to send fake scores really easily, myself included. It was just an packed js file and you could deobfuscate it, pull in the score generating js and send fake http response to their server. This time, sjs did some type of side scrolled with a dinosaur jumping over buildings. Link here: http://www.slamjamsocialism.com/arcad-ism/


Not in every case, you're right, but it's something to keep in the toolkit. Sometimes it's easier to just check the DOM after its fully rendered than it is to explore all of the calls made by each individual site you're scraping.

What you're saying makes sense if you're looking at a handful of sites, but it doesn't scale very well compared to a universal solution for monitoring several retailers at once. It's definitely the more wasteful solution, if you are concerned about bandwidth and resource consumption, though.


I know I don't go much into depth what kind of changes were applied, but you can read some of them here: https://www.juusohaavisto.com/northern-nike-nabob.html


mmmmm little impractical. before I explored that option I explored arbitrage efforts like finding web server location and hosting my bots close by.

again, minimal sophistication is required to dominate. most of the developers are not doing deep exploration. you just need to be faster than someone moving a mouse or using an auto-fill script.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: