If so, using wget is a poor solution. I have not used wget in over a decade but as I recall it does not do HTTP pipelining; I could be wrong on that - please correct me.
I do recall with certainty that when wget was first written and disseminated in the 1990's, "webmasters" wanted to ban it. httpd's were not as resilient then as they are today, nor was bandwidth and hardware as inexpensive.
HTTP pipelining is a smarter alternative than burdening the remote host with thousands of consecutive or simultaneous connections.
Depending on the remote host's httpd settings, HTTP pipelining usually lets you make 100 or maybe more requests using a single connection. It can be acomplished with only a simple tcpclient like the original nc and the shell.
In any event, the line about a "distributed crawler" is spot on. Never understimate the power of marketing to suspend common sense.
Also, I find that I can often speed my scripts up a little by using exec in shell pipelines, e.g., util1 |exec util2 or exec util1 |exec util2.
There are other, better approaches besides using the builtin exec, but I will leave those for another day.
If so, using wget is a poor solution. I have not used wget in over a decade but as I recall it does not do HTTP pipelining; I could be wrong on that - please correct me.
I do recall with certainty that when wget was first written and disseminated in the 1990's, "webmasters" wanted to ban it. httpd's were not as resilient then as they are today, nor was bandwidth and hardware as inexpensive.
HTTP pipelining is a smarter alternative than burdening the remote host with thousands of consecutive or simultaneous connections.
Depending on the remote host's httpd settings, HTTP pipelining usually lets you make 100 or maybe more requests using a single connection. It can be acomplished with only a simple tcpclient like the original nc and the shell.
In any event, the line about a "distributed crawler" is spot on. Never understimate the power of marketing to suspend common sense.
Also, I find that I can often speed my scripts up a little by using exec in shell pipelines, e.g., util1 |exec util2 or exec util1 |exec util2.
There are other, better approaches besides using the builtin exec, but I will leave those for another day.