I created a web application and set of scripts late last year to snapshot sites like that on a daily/hourly/minutely interval. Also set up the web app to manage the captured images and turn them into videos.
Some of the interesting things I found:
- interesting to compare news sites coverage of the same news stories - see who publishes stories first and where on the page...
- quickly analyse site ad and content refresh rates
- instant time lapse videos from web cam sites
- some interesting artistic effects as content changes and moves on sites
- "photographic record" of web sites was interesting to see some sites not update or be broken at times
- very easy to generate gigabytes of content in small amounts of time!
I have't have time to extend the project further right now, but I still have jobs running capturing some of the top sites daily to get some year-long web-time-lapse videos and do something with the content. If anyone has ideas to commercialize the content or technology, let me know.
Note: Technologies used - Ubuntu, bash, CutyCapt, JSP, ImageMagick
I must confess, I actually watched the entire video! Much better than looking through hundreds of screen-captures in a list and interestingly entertaining!
It's striking to me the number of watch makers that advertised through the course of the year. The ads primarily caught my attention - which is strange, because I rarely look at ads when browsing sites. The one constant, from all the ads, was a watch manufacturer.
Curious if the person who captured these images had a browsing history for watches, or if that's what everyone witnessed? Next experiment: Two completely different users capture these on the same time interval -- side by side comparison! ;-)
>Curious if the person who captured these images had a browsing history for watches, or if that's what everyone witnessed?
Alas, no. I used a webkit to jpeg generator that should, in theory, be pretty void of any browsing history. I'd be surprised if they've started tracking those by ip!
You shouldn't be too surprised, it's quite possible they were tracking some combination of IP, user agent, and a number of other things to identify the browser. I don't know that you would have ended up having a "watch" preference, though, especially without clicking on watch ads or visiting a watch site.
I would love to see the times adopt a layout for their front page which is more web based and less like imitating the front page of a paper, at 5 columns across in places it is too squished together at only 970px across.
It would actually be a great candidate for responsive design, could make the current columns setup look far nicer with more width when it is available then remove columns where there is less screen width available, similar to http://theconversation.edu.au/ which has a similar amount of columns across.
I used http://code.google.com/p/wkhtmltopdf/ . It's an amazing project that I try to use as often as possible, especially whenever a client requires pdfs.
I've often found myself needing -- and thinking a/b building -- a webapp/site for taking/collecting screenshots of other sites. But I'm not sure it'd actually be useful and I certainly don't know if anyone'd pay for it.
That would be useful. Archive.org is better at not breaking than I imagined it would be, but still often leaves broken assets. The combination of the two would be really cool: see the page, and how the page was viewed at the time.
one odd thing about the video is it should be 7+ mins long, but shows up in YouTube as 5:36 or something. Trick: Watch in 480 mode, and it keeps playing to the full length :-S
You gotta keep watching so you don't miss some more big news events like Osama's death !
Some of the interesting things I found:
- interesting to compare news sites coverage of the same news stories - see who publishes stories first and where on the page...
- quickly analyse site ad and content refresh rates
- instant time lapse videos from web cam sites
- some interesting artistic effects as content changes and moves on sites
- "photographic record" of web sites was interesting to see some sites not update or be broken at times
- very easy to generate gigabytes of content in small amounts of time!
I have't have time to extend the project further right now, but I still have jobs running capturing some of the top sites daily to get some year-long web-time-lapse videos and do something with the content. If anyone has ideas to commercialize the content or technology, let me know.
Note: Technologies used - Ubuntu, bash, CutyCapt, JSP, ImageMagick