> Most scripts are either headless chrome instances using phantom is or selenium.
I would think that using regular HTTP requests is better/more popular then running a headless browser for this kind of work. cUrl, Python Requests, etc.
When a store is selling 20 pairs of a shoe, they will literally look at how the person who bough the pair did their web execution. Using a something like python requests is faster sure, but most of the time looking more "real" is more important. I've written a few of these and do both depending on the company (size etc.), their website, their security etc. Some of these companies have gone as far as setting up a web game people play and the highest scores get a pair of the shoe (slamjamsocialism).
Can you go into more detail about this game? First I've heard of it, and it clearly isn't a popular choice as opposed to say, a splash screen or other methods of deterring bots.
Yeah. At first is was pong. People figure out how to send fake scores really easily, myself included. It was just an packed js file and you could deobfuscate it, pull in the score generating js and send fake http response to their server. This time, sjs did some type of side scrolled with a dinosaur jumping over buildings. Link here: http://www.slamjamsocialism.com/arcad-ism/
Not in every case, you're right, but it's something to keep in the toolkit. Sometimes it's easier to just check the DOM after its fully rendered than it is to explore all of the calls made by each individual site you're scraping.
What you're saying makes sense if you're looking at a handful of sites, but it doesn't scale very well compared to a universal solution for monitoring several retailers at once. It's definitely the more wasteful solution, if you are concerned about bandwidth and resource consumption, though.
mmmmm little impractical. before I explored that option I explored arbitrage efforts like finding web server location and hosting my bots close by.
again, minimal sophistication is required to dominate. most of the developers are not doing deep exploration. you just need to be faster than someone moving a mouse or using an auto-fill script.
I would think that using regular HTTP requests is better/more popular then running a headless browser for this kind of work. cUrl, Python Requests, etc.