Hacker News new | past | comments | ask | show | jobs | submit login

As someone who lives in Iran, this is sad but not news. I have gotten used to see half the websites blocked by my government(Facebook, YouTube, Flickr, WordPress, etc) and the other half by your government(Java SE or anything else from Oracle, Google Code, Google Play Store, anything from Xilinx, etc).

If one of my favorite websites was blocked, I may have considered not using it anymore. When virtually all websites are blocked, I can either not use the internet or find a way around it. Of course I chose the second option. Most Iranians have been using proxies and VPNs for the past few years. This blockage would not affect us much.

P.S. Please stop using Google Code. Edit: Also App Engine. Udacity has been inaccessible to Iranians since the beginning because they use App Engine for hosting. This is what I get when I try to access Udacity: http://i.imgur.com/zUecPHk.png

P.P.S. I am curious what percentage of the internet is blocked in Iran. When you try to access a blocked website, the censorship system shows a page explaining that the website is blocked and some links to Iranian websites. Is it possible to write a script to scan all the internet (or at least the popular websites) and determine which ones are blocked? Here is what I get when I try to access YouTube: http://git.io/HG3nsQ

I have two questions:

1. Where can I find a list of all domain names, top 1000, top 100000?

2. Is it possible to conclusively determine censorship from headers only or do I have to load the whole page and compare HTML code with a sample? Bandwidth is very expensive here.




Is it possible to write a script to scan all the internet (or at least the popular websites) and determine which ones are blocked?

If you can find or make a list of websites you want to scan, you can script it. The biggest problem is doing it in a way that doesn't bring you to the attention of those doing the blocking.

1. Where can I find a list of all domain names, top 1000, top 100000?

Alexa's "top 1,000,000" list (~10.2 MB download) is at http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

2. Is it possible to conclusively determine censorship from headers only or do I have to load the whole page and compare HTML code with a sample? Bandwidth is very expensive here.

It depends on the method used to block you from visiting a website.

If DNS-based blocking is used, you can use very small DNS lookups to identify whether or not a website is blocked — all of the hostnames of blocked websites will probably resolve to the same IP address. (You can check this with "nslookup www.website.com" in Windows or "host www.website.com" on Linux, OS X, etc.) If this method works, it's probably the best way — DNS requests are less likely to be logged than HTTP requests, and DNS requests and responses are small.

If the blocking uses a transparent proxy instead of forged DNS records, you could use HTTP HEAD requests and match against the "Server" header in the reply:

    Server: Apache/2.2.12 (Unix) mod_ssl/2.2.12 OpenSSL/0.9.7d mod_wsgi/3.2 mod_perl/1.29 PHP/4.4.1
The software listed in that "Server" header is terribly old, and I doubt you'll find any other web server on the Internet with that exact combination of software versions. So that could be a way to identify the server serving the "website blocked" page without downloading entire pages, but it might draw attention to you if you do it for thousands of websites.


"...The biggest problem is doing it in a way that doesn't bring you to the attention of those doing the blocking...."

I think this is a HUGE issue that should not be taken lightly. A guy scanning certain websites from Iran IS going to attract some attention no matter how benign his motives. It just won't be taken lightly. That attention can land you on lists you don't want to be on.

I'm not saying that I don't sympathize with his/her situation... I just think that certain actions can be viewed by people with a security mindset as hostile. Indeed it may only increase the number of sites being blocked. As well as, SEVERELY restricting his/her ability to travel without being arrested. And if you attract enough of the right attention... you may find that being arrested is the least of your worries.

And all of this doesn't even take into account what Iranian authorities may do from their end.

Advice like this, given on a public forum via easily identifiable pseudonyms, should be taken with a BIG grain of salt.


Having lived the first two decades of my life, and naturally had to circumvent network blocking, in Iran, I can tell you that's not how they work. Most of the blocking they do is targeted at the masses, and most people actually do circumvent it. People who circumvent their internet blocking facilities do not generally face persecution, as it's basically 100% of the internet users.


I was referring, mostly, to what American authorities would think of an Iranian IP address port scanning web servers. That will get the attention of American authorities... and not in a good way.

You just don't go port scanning and probing willy nilly in the US. That's DOUBLY true if you are port scanning and probing sites that the US government has blocked... AND you are doing it from inside Iran.

You're just BEGGING for Homeland Security to take a closer look at you. It's very foolish.

You may know your Government... but I know mine. I can tell you that an Iranian probing sites whose access from Iran is blocked by the American government for security reasons... that's not bright. Authorities here will not take kindly to it.


Exactly.

گر حکم شود که مست گیرند

در شهر هر آنکه هست گیرند

Sorry, I couldn't help citing this particular piece of Persian poetry. Trust me, it's relevant.


"If they tell you to get drunk, everyone in the city is the boss"?


The literal translation would be something along the lines of "if they rule to arrest drunk people, they'd have to arrest everyone in the city."


Thanks. They do not use DNS-based blocking. I will try using the HEAD method if I find a way to do it anonymously.


If you have the option writing it to look for instances of US blocking that only incidentally finds local censorship may give you some ass-coverage.


Be careful, I don't think there would be a good way to do this anonymously without distributing the workload.


If you would like me to, I can set up a Tor bridge for you. Unfortunately, YCombinator doesn't have a private messaging system, so we'll have to figure out a way to communicate the details securely. Cryptocat is blocked. If you're familiar with GPG, we could use that right here. IRC is possible too. Email is not safe.

I would not recommend you to do a mass censorship scan from your own IP. It's a given that one or more of the top 100,000 sites triggers some kind of flag, apart from the fact that such activity itself may mark you as a person of interest.

Another thing to consider is that the government can likely link your YCombinator account to you because there are few YCombinator users in Iran, and from that subset, only a small number (maybe 1) matches your posting timestamps. I'm of course assuming that they keep such traffic logs. Syria's surveillance system did/does.


I'm not going to support breaking security, but it seems like it would make it hard to ID dissidents if Windows viruses sometimes accessed random sensitive DNS addresses (I assume things like Danish cartoons, democracy/atheist information, how to change religion, etc). Maybe only if in Iran or Saudi Arabia.

Something similar from China, but with a different hotlist.


Hi! Cuban here, pretty much same situation as you, I feel your pain, every time I see the little broken robot when attempting to get anything from Google Code I thank we have Github and Bitbucket.

If Github gets blocked we should get something on, If only blocked governments took this issue seriously and had these essential services covered, but I guess something as amazing as github takes real starters and not some lame government founded dev group.

Anyway, just saying, we blocked people should hang around more often.


Forgive my ignorance, but who is doing the blocking? Is it American sites willingly censoring for legal reasons, or is the Cuban government blocking them for the country's own ISP?


American sites willingly censoring for legal reasons


If you're going to perform a comprehensive scan (i.e. not just to sample alexa's 1M), there's a lot of crap waiting for you in the long tail -- you may want to use my subset of alexa's rankings instead, which contains only names that have been on the list for the last 322 days (it's ~700K rows): http://www.szejda.pl/pub/alexa-20130313-20140128.bz2


1. Where can I find a list of all domain names, top 1000, top 100000?

Alexa http://www.alexa.com/topsites could provide you with data which is for the "top 500".


I know about Alexa, but 500 is too small for statistical analysis.


Look for the link there to download a list of the top million domains (according to them, of course).

Edit: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip


Thanks.


If the blocklist is manually curated then the probability of a website being blocked will depend on its popularity. I wouldn't just be interested in "X% of sites blocked," I'd look at "Sites seeing Y% of web traffic blocked" etc.


It is a combination of manual and automatic blocking. Facebook censorship is manual. Dick Cheney Wikipedia page being blocked is because they have added Dick to their automatic blacklist, so it gets censored regardless of the context.


So you can't connect to Wikipedia using HTTPS? What's the policy on HTTPS in general?

Edit: Never mind, you already answered it in another comment.


We can use HTTPS, but it is usually slower and less reliable. If you are uploading a 10 MB attachment to gmail using HTTPS, you should expect it to timeout and fail 4 or 5 times before you either succeed or give up. With HTTP there is usually no problem. When I was downloading some large files from S3, I noticed that the transfer speed was 10-15 kB/s. I changed the URL to use HTTP and immediately got a 4x speedup (almost the nominal speed my ISP offers). Sometimes HTTPS is almost as good as HTTP. Usually it is 3-5x slower. Near special occasions (election days, etc) it is so slow you would get a timeout error 9 times out of 10.


Alexa also has Top 1000000 sites, updated daily:

http://s3.amazonaws.com/alexa-static/top-1m.csv.zip


How common are VPNs?


Government has tried blocking all anti-censorship technologies, so normal PPTP and L2TP VPN services will not work. VPN vendors now offer their own software which I assume uses non-conventional ports and settings to circumvent it. But VPNs are not the only solution. Most people use software like Psiphon or Freegate to access the internet. I don't know about general population, but everyone I know uses some kind of anti-censorship solution.

Facebook is censored in Iran, yet it is very popular. If you want to roughly estimate how many people circumvent censorship on a daily basis, just find out how many Iranians actively use Facebook. I guess it would be possible to find a number with Graph Search.


Here in Brazil we have many laws that have not "caught on", that means nobody follows them and the government doesn't care to enforce it. Of course that is bad, but sometimes these laws are very stupid and that's why it is the way it is.

In Iran with the censorship, could the same thing be happening? Someone created it, the government sees some uses for it, an organization was tasked with enforcing it but since there are a lot of ways to circumvent it and everybody knows how, that might show this organization does not actively care enough to update it's filter. That is, it's there because someone someday had the stupid idea of creating a big firewall, but there is less and less support to actually make it real and effective?

Would I be right in assuming this much?


The situation is quite different in Iran. Traditionally the government had control over all media. Ministry of Culture and Islamic Guidance reviews all books before publishing and may remove the parts it doesn't like or prevent books from being published altogether. Same for newspapers and magazines, except they don't review them before publishing, but if they find something offensive they close the newspaper. The only entity allowed to operate TV or radio channels is Islamic Republic of Iran Broadcasting(IRIB) which is part of the government, etc.

When internet became popular, the government monopoly got threatened. Since then they did everything in their power to restrict the use of internet. First they passed regulations that forced private ISPs to buy their bandwidth through a government organization and deployed a censorship software (rumored to be Chinese) on all of it, so blocking is national not per ISP. The they passed a law restricting home users to 128 kbps (yes, kilobits per second not even kilobytes). Then they criminalized providing anti-censorship solutions (but not using them, although it is debated). In some occasions (like after 2009 election), they make the internet so slow it is virtually impossible to use. HTTPS traffic is always slower than HTTP and occasionally completely blocked.

It is all about maintaining power to control the narrative. As I said, it hasn't worked as well as they have expected. Now they are building something called National Internet. They say they don't plan to block access to the internet but I am not so hopeful.

It has been a decade-long battle between government and freedom of information and speech. Most people who are affected aren't dissidents, but simply people who want to update their Facebook status.

It is sad because we are a fairly developed country. There is no war or famine, our healthcare system is good, we have powerful industries, good universities, big cities with good public transportation and interstate highways, etc.

The problem is that over the years since the Islamic Revolution our nation has become more liberal in general while the government remains rigidly conservative. It will be a long answer to describe where we are and how we got here, but I think this short comment is enough to answer your question.


Yeah, typically the trick here is to run OpenVPN on port 443/TCP (HTTPS, which almost no one bothers to examine), you can also stunnel it if by some chance they're doing deep packet inspection and blocking OpenVPN connections.


AirVPN has obfsproxy, or you can rent a cheap vpn that takes bitcoins, check the bitcoin wiki, and set up obfsproxy yourself. It camoflauges traffic to look like regular http to bypass censorship.

Pretty sure most coursera vids and materials are ripped and avail via torrent too. If not could wholesale rip the site and mirror it free on yandex cloud. Russia laughs at US petty sanctions


About 4 years ago when I was leaving Iran, you could see physical establishments (i.e. shops) publicly advertising VPN access. While I'm not sure if you can sell VPN access as obviously as before, I'm quite sure that most internet user do use some sort of circumvention/proxying technology.

I, personally, was always very skeptical about the VPN services sold in the wild. Who knew what machine you were routing your data through was not controlled by the government themselves?


Very very common, I can say +90% of students already use them in our university. others universities may not be any different.


There are no statistics, but almost everyone use them. Facebook is very popular in Iran. Even there were some debates in parliament and government about not filtering it.


> P.S. Please stop using Google Code.

Why?


As I said, it blocks anyone coming from Iran. I think it does not solely rely on IP address, because I can't access it with my normal VPN service, even though it gives me a Canadian IP.


Make sure your DNS lookups are also routed through the VPN.


Sorry I should have been more clear, I meant, why are you singling out Google Code over the other ones?


Most HN visitors are programmers. Some may have projects hosted on Google Code, thus inaccessible to parts of the world without them even knowing. In the past, whenever I encountered a project hosted on Google Code which I needed, I contacted the developers and explained the situation. When I saw this discussion here on HN, I decided to use the opportunity to ask everyone who has a project hosted there to move it to a more international-friendly alternative like Github or Bitbucket.

I don't think it is probable that Oracle, Xilinx or Google change their policies. Yet it is in the developer's power to decide where to host a project.

I like to ask everyone who has a project hosted on Google Code or is thinking about using it for future projects to use another service if possible.


Perhaps because it does some extra effort to detect VPNs and accurately block people accessing it from blocked regions? I'd imagine he's frustration is that Google isn't just reluctantly putting a dumb block to obey some external injunction, but seems to put some effort in it.


Honestly, living in China where blogspot/wordpress.com was blocked without a VPN was particularly annoying, I fully support the Iranian guy who says its a pain. USA should realise that people don't give a damn who's at the other end. Speaking of which, look at the freedom USA gives trade and tourists: No Cuba, no iran, no N.Korea, etc. Come off it, most passports are not half as restricting its almost as if your government is as restricting as N. Korea and you call USA a land of freedom?


Exactly! The US (or the land of freedom), doesn't let you in if they find out you've as much as visited Cuba. That's why Cuba gives you your visa in hand (as opposed to stuck in a page in your passport), so that you can still go to the US if you want to.

That is, unless you're Cuban. In that case you can go to the US and they'll welcome you to Miami with a red carpet and fool you with what freedom and cars and big houses and laptops you'll be able to have.


While there might be broad laws and regulations out there, you have to go out of your way to implement technological measures to restrict access to a certain service. Not all services are created equal: I think it is reasonable to advocate use of services that are more passive in implementing idiotic pieces of legislation and discourage use of the ones that are more restrictive, for whatever reason. It's pure pragmatism, especially when there are alternatives like GitHub. Absorbing legal risks is part of the value a service provides.

Google in particular has been one of the hardliners when it comes to restricting access to users with Iranian IPs. There are other big US companies (like Microsoft, for instance) that are much less active in banning IPs.


How do you pay for the VPN?


Here is the strange part: with debit card. Theoretically the government can identify and arrest all VPN providers in 24 hours. For some reason they have never made a single VPN-related arrest as far as I know. This has led many (including me) believe that VPNs are government honeypots. People are going to circumvent censorship one way or the other. If they provide VPN for those who are really seeking it, they will lose their power for censorship but not surveillance. I assume they don't care so much about people watching porn as they care about citizen journalists writing for BBC (many of whom have been arrested).


That's interesting - I assumed they would block card payments, as well.

Hell, I have trouble paying for stuff in the EU with a US card...

Anyway, VPN providers are NOT your friend for privacy and escaping surveillance/censorship. They actually cooperate with the authorities!

Hidemyass and Astrill used to openly state they will provide any and all details of your identity and activities if the authorities request it!

There's no need for honeypots when all these companies give up your identity with a single call from the police.

They also discriminate against automated tools and robots for whatever reason, as if paying a human to do the work is better :-)


But in this case, its the US blocking Iran not Iran censoring itself..


Many VPN providers these days accept Bitcoin, just buy it locally and send it around a little and you're good to go. Though I doubt the government cares enough to look into everyone who buys VPN via other means anyways.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: