Hacker News new | past | comments | ask | show | jobs | submit login

I know about Alexa, but 500 is too small for statistical analysis.



Look for the link there to download a list of the top million domains (according to them, of course).

Edit: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip


Thanks.


If the blocklist is manually curated then the probability of a website being blocked will depend on its popularity. I wouldn't just be interested in "X% of sites blocked," I'd look at "Sites seeing Y% of web traffic blocked" etc.


It is a combination of manual and automatic blocking. Facebook censorship is manual. Dick Cheney Wikipedia page being blocked is because they have added Dick to their automatic blacklist, so it gets censored regardless of the context.


So you can't connect to Wikipedia using HTTPS? What's the policy on HTTPS in general?

Edit: Never mind, you already answered it in another comment.


We can use HTTPS, but it is usually slower and less reliable. If you are uploading a 10 MB attachment to gmail using HTTPS, you should expect it to timeout and fail 4 or 5 times before you either succeed or give up. With HTTP there is usually no problem. When I was downloading some large files from S3, I noticed that the transfer speed was 10-15 kB/s. I changed the URL to use HTTP and immediately got a 4x speedup (almost the nominal speed my ISP offers). Sometimes HTTPS is almost as good as HTTP. Usually it is 3-5x slower. Near special occasions (election days, etc) it is so slow you would get a timeout error 9 times out of 10.


Alexa also has Top 1000000 sites, updated daily:

http://s3.amazonaws.com/alexa-static/top-1m.csv.zip




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: