One of the authors here, the intention was to provide a primer of the topics since we figured this would draw people from a nontechnical background too.
That and half the information on the internet about VPNs is from VPN providers and is incorrect or not technical enough to describe how they _actually_ work.
We had a sentence in the intro that was supposed to be a hyperlink to the “hey if you know this stuff you should skip to the POC section”. I’ll make sure that gets updated/more obvious.
What I read does a great job of explaining the technology; I know it pretty well but it's great to have an updated, clear, concise, well-organized, integrated explanation all in one place. I can't imagine how long that took to make it that clear!
(Everything posted here gets similar complaints about the writing, headline, too short, too long, too hot, too cold, etc. Goldilocks is never pleased here. Welcome to HN! :)
Good call. For a general IT audience you are speaking to, context
building really helps refresh the scene before dropping the exploit.
Well communicated.
The PoC section doesn't explain the issue. I think a one-line TL;DR similar to the summary above would be best, e.g. "A malicious DHCP server can use DHCP Option 121 to set routing rules, which can override the routing rule used by VPNs and cause traffic to be routed outside the VPN"
(I like it that you provide the background for people who need it, but also found the actually relevant information extremely annoying to find.)
Or they could have maybe lead with that sentence and THEN given the explanation.
Too many tech people have that "I want to slowly lead you to the point like Sherlock Holmes mystery" style of writing, and it is such a time-waste. Arthur Conan Doyle was paid by the word, you aren't. Please, everyone, back to middle school: State a Thesis in your first sentence and THEN expand on it, don't force me to spend pages trying to figure it out.
It's not just tech people, but any field with a high enough complexity.
The "abstract" of a journal article is supposed to contain all the key points of a science experiment including the results, but it's too rare that they do.
I think some folks are just hitting their limits, and needed more time to digest/ review their publication.
Other folks are doing it I obfuscate or pad their work, for whatever reason.
When you're deep enough in a thing it can be hard to know what counts as "high level summary." For example, "attackers can decloak routing-based VPNs" might seem like a good high level summary. "Attackers can decloak routing-based VPNs using DHCP rules that give priority to an attacker over other lower priority routes" might seem like it's just in the weeds enough to be misleading, or to result in a bunch of people now believing they are educated on the subject when they really are not.
Picking the right level to communicate such that you avoid clickbait journalists spreading a lie of omission/ hysteria is an art. Personally, I think we should be grateful for all the effort put into clearly communicating all the most relevant nuances; we can generalize that any high complexity field is doing its readers a service when it approaches communication this way. I'd rather the "result" be communicated at too high a level than too close to the middle (giving the illusion of understanding the nuance)
Just accept you were not the target audience and skim like the rest of the world. Not every article is written for you. It's available for you to read, but was more than likely not with you in mind. Some of us still like words and the reading of them when they provide details and more in-depth understanding than a tweet.
I hate tweet culture as much as anybody, but this is not the alternative. This article is so painfully long, I got bored even just trying to skim it. Reading it word for word will turn any noob into a seasoned greybeard through the sheer passage of time. If you really like words and reading them this much, I'd recommend adding some dictionaries to your reading list.
Have you identified actual VPN vendors that are affected by this? I won't disclose which ones I use, but I would love to know if they have been affected.
I looked at this in detail. This exploit is a nothing-burger for most decent VPNs.
A simple "leak protection" (aka Killswitch) firewall rule completely negates this attack.
All decent VPNs implement such a rule by default.
Dealing with undesirable routes (whether pre existing or pushed by a DHCP server) is nothing new or in the slightest bit hard to defend against.
If a VPN does not implement such a firewall rule already then it's likely already leaking so all this exploit demonstrates is that "A VPN without leak protection, leaks".
(I won't even mention the "side channel" attack as it's completely ridiculous)
I liked your write-up and option 121 is a little known option, so it's good to know about. But let's not pretend this thing is bigger than it is.
FTA: Importantly, the VPN control channel is maintained so features such as kill switches are never tripped, and users continue to show as connected to a VPN in all the cases we’ve observed.
Most practical VPN services don't actually implement it this way, it's a somewhat difficult and rather OS-specific problem depending on the firewall services offered by the OS. On some popular OS like mobile ones it's just not possible at all.
So just to grab an example, NordVPN's implementation does indeed work as the article presents: it monitors the VPN and disables network access for applications if the VPN connection drops. This is indeed vulnerable to any number of potential problems, and depending on the OS and user savvy you can set up better protection using e.g. the iptables owner module. It's very non-portable though, sometimes even between Linux distributions, and hard to support at scale. Actually I'd say a true "no access except through the VPN" rule is easiest to implement on Windows, but NordVPN doesn't seem to do it there either, I'm not sure why.
To be fair, it's right in the name: a kill switch is a switch that kills things. It isn't proper network policy like per-process routing tables that are, unfortunately, difficult to implement for consumer machines.
Mobile is an exception (but they already state android is immune), let's stick to desktop for the sake of discussion, the 3 major desktop platforms: mac, win, linux :)
On mac - just implement a block everything rule with pf and then just allow traffic on the tunnel and whitelist the VPN endpoint. Boom, a kill switch that defends against this exploit. And there's no racey nordvpn-style "control channel" (if nord really works like this i have an even lower opinion of them than i do currently).
On linux - iptables (for example) - just implement a general DROP policy then override with a specific ALLOW on the tunnel interface.
On Windows - Use WFP to implement a block everything rule, then provide a higher priority rule to allow on the tunnel interface.
All three of these techniques are the recommended way to implement a kill switch and it's used heavily in the VPN industry by anyone sensible. It completely defends against this TunnelVision exploit too.
The way that you suggest kill switch is implemented (reactive and monitoring the connection?) is very fragile, racey and prone to leak, i absolutely would not trust it and it shouldn't even be called a kill switch. It's an embarrassment. :)
I'm not saying how it should be implemented, I'm saying how it is implemented by a number of popular VPN services. Take up the argument with them. There are VPN providers that do it right on at least some platforms, but unfortunately the way most document it it's very hard to tell without experimenting with the client to verify.
As far as I know, use of the term "kill switch" closely correlates with an untrustworthy implementation. Consider the case of Mullvad who handle this a lot better and also decline to call it a "kill switch" for that reason. And that's not to say that Mullvad is perfect, easy to find forum threads by people who had traffic leakage for various reasons. I wouldn't trust anything you didn't set up yourself.
You wrote three different ways to end up with all traffic dropped and a broken VPN connection.
Traffic sent to the VPN interface gets encapsulated by the VPN client software and then routed to the Internet. If your firewall rule is dropping all traffic not destined for the VPN interface, it will drop the encapsulated traffic.
You need two (sets of) rules: one allowing traffic on the VPN interface and one allowing traffic which is already cloaked by the VPN software (or not cloaked, but used to establish/maintain the tunnel itself). That second category is a bit complicated, because you need to be able to route to the VPN server regardless of which network you're connected to - and the DHCP server tells you how to do that.
Yes. I just provided simplified firewall rules in my answer. You also need to whitelist either the VPN endpoint itself (and add a route to that endpoint) or you need to whitelist the process (such as wireguard or openvpn) that hits that endpoint.
Not sure how a DHCP server is relevant in the slightest here except for the initial host network config of course. But the host network should already be configured before the VPN comes up.
Source: i've implemented this dozens of times (and you probably have too, it sounds like) so let's not quibble over the details ;)
If you're connected to a random network, whose configuration you don't know in advance, how do you route packets to your VPN server?
The usual answer is that the network's router tells you how to do that, by supplying DHCP options.
The point I'm making here is that you can't just configure a firewall rule and have it work properly. What actually needs to happen is that the VPN client software is using one routing table - let's call it "host routing" - and everything else on the system is using a second routing table - let's call that "VPN routing".
The DHCP server inserts rules into the host routing table, and the only software using those rules is the VPN client for its management and tunnel traffic.
Otherwise, what if the network to which you connect says "the next hop for all internet traffic is 10.10.10.10"? You need to respect that rule when sending traffic to your VPN server, and ignore it for applications whose traffic will be tunneled.
Let's walk through this step by step because there's a lot of confusion on your end.
* Step one - You connect your computer to a network - yes you'll get a DHCP lease, and you'll get an ip address, and a default gateway. This default route will be added to your routing table.
* Step two - If the TunnelVision exploit (DHCP option 121) is at play you'll also get a few MORE SPECIFIC routes than the default gateway. These also get added to your routing table
* Step three - You connect your VPN. The VPN will bring up a firewall. It will also bring up `128/1` and `0/1` routes that point at the VPN tunnel. The VPN tunnel now takes over the default route. This firewall will block all traffic that's not on the tun device (the VPN interface). Further, it will whitelist the VPN endpoint IP and create a route for it (it can do this since it already received the default gateway from the DHCP server)
* Step four - Your host starts sending traffic - either this traffic will go through the VPN tunnel (the default route) OR it will attempt to go through the more specific option 121 pushed malicious routes added by the compromised DHCP server (depending on the destination ip of the outbound packets).
* Step five - All traffic that would go down the malicious option 121 routes are BLOCKED by the firewall rule. Hence nullifying the TunnelVision exploit.
That's all. Done. Where's the complexity in that? As i said before i've done this dozens of times. I'm talking from experience. I know this works.
Further you say:
> The point I'm making here is that you can't just configure a firewall rule and have it work properly. What actually needs to happen is that the VPN client software is using one routing table - let's call it "host routing" - and everything else on the system is using a second routing table - let's call that "VPN routing".
You are aware we're talking about consumer VPNs right? The majority of users are on Windows and Mac. Neither of those OSes support multiple routing tables. Only Linux supports multiple routing tables.
You're also just plain wrong - as i demonstrated above - you CAN just configure a firewall rule and it WILL just work properly. Again, i'm talking from experience.
In your step 4, what happens when the VPN traffic gets routed over option 121 pushed routes?
Don't you block it - thus blocking your entire VPN?
> OR it will attempt to go through the more specific option 121 pushed malicious routes added by the compromised DHCP server (depending on the destination ip of the outbound packets).
This right here... we don't want our VPN-secued traffic going out over routes broadcast by the malicious DHCP server, so you block it... right?
How does that traffic leave the local network and reach the VPN server?
Read my reply to the other poster, i answer exactly this. Actually test it yourself. Stop theorizing. I tested it. It works exactly as I said.
I think i know where you're confused. There is a firewall whitelist on the VPN endpoint route. Also it's impossible for the DHCP server to push a route more specific than this since it's a /32 route, so it's unaffected (together with the firewall rule allowing it) by anything the DHCP server attempts to do.
I think you might be saying to add rules like `iptables -A OUTPUT -d <vpnserver>/32 -j ACCEPT`, `iptables -A OUTPUT -o vpn0 -j ACCEPT`, and `iptables -A OUTPUT -j DROP`.
I'm a bit confused though because you only mentioned one rule and that's three. But also, I think using that combination of rules would result in dropping all traffic that someone attempts this attack against - in other words, turning it into a denial-of-service attack instead of a loss-of-confidentiality one.
But there's no technical need to drop the maliciously-routed traffic, is there?
Yes exactly. It becomes a "denial of service" against the option 121 pushed subnet routes. That's already discussed in the paper, i assumed you knew that already.
There's nothing else you can do in this situation other than detecting and then removing those routes, which is possible. In lieu of deleting the routes the best and most secure option is to block that subnet. A DoS is infinitely better than a LEAK.
There are SOME things you could do (other than just removing the routes) to prevent the DoS i guess if you REALLY wanted - there is some package rewriting capabilties in the mac pf firewall and windows WFP would support this too (though it would require a 'callout' driver (kernel code) at the IP_OUTBOUND layer), and linux allows something like this too with fwmarks and multiple routing tables + a source NAT, but it's not really worth the effort in a rare case like this. Easiest just to let those packets be blocked. The network you're on is controlled by a bad actor with a malicious DHCP server. Best option for you is to GTFO.
What I do is put the VPN client into a tagged network namespace (yes, fwmark), and then have a routing rule that makes everything else use a separate routing table.
The DHCP server inserts rules into the routing table used only by the VPN client.
Doing it that way, all leaks are prevented, and also there's no way to denial-of-service traffic within the tunnel - no matter what routes are pushed, it keeps flowing as normal.
Yeah, lots of cool stuff you can do with Linux. just wish that the other OSes were half as good, unfortunately most of them require kernel code to do what would be a simple shell script in linux
The only problem with this persons comments is saying "you're wrong" "you're confused" so much.
The actual content is 100%.
Get over the "you're wrong" tone and ingest the tech message.
It's really a misnomer to call the firewall a kill switch since it isn't reacting, it's already in effect, already blocking the bad traffic before the bad traffic happens. No switch is thrown.
Any vpns that DO work that way are silly and should not be used. If this is most popular commercial vpns today, oh well so be it.
The articles going around saying "affects all vpns and nothing can stop it" are also just silly and wrong. But it is probably true that most convenient vpns are currently leaking.
I can see how you can write rules that block "bad traffic", but I can't see how you write them so they don't also block some "good traffic" when the network assigns a routing rule.
I think the person here might be glossing over writing overzealous rules that cause the VPN connection to go down when an Option 121 route is assigned, when the ideal solution leaves the VPN functional (and causes tunneled traffic to ignore the route).
I don't understand your explanation because you just keep alluding to certain firewall rules but not actually showing them.
If you've done this, could you paste an `iptables -L -v` for me? That would make clear exactly what you're talking about. If there is a problem, I could then point it out, and if there is not, I could then understand how to do what you're saying.
Step one, you connect to the network and get routes.
Step two, you connect your VPN.
Step three, your host starts sending traffic. At this point, your firewall rules are now active and dropping any traffic you've told them to.
Step four, you renew your DHCP lease and get new routes via option 121. Those routes might be malicious, or they might not.
One of three things is true at step four. Either:
A. Your firewall rules will block all traffic over the new routes
B. Your firewal rules will not block any traffic over the new routes
C. Your firewall rules will block some subset of traffic over the new routes
If A is true, then your VPN tunnel goes down (undesirable), as the VPN server can no longer be contacted.
If B is true, then you are vulnerable to the TunnelVision exploit.
If C is true, and the subset of traffic blocked is exactly the subset intended to route over the VPN but maliciously diverted, then the VPN tunnel goes down because the firewall rules are blocking its traffic.
If C is true, and somehow the firewall rule is rewriting the traffic that's pointed not-over-the-VPN to be instead routed over the VPN (by using NAT?), then the VPN tunnel stays up and there is no problem.
I'd be interested in seeing the set of firewall rules that will let the VPN tunnel stay up, with management traffic going over the added-after-the-tunnel-was-brought-up next-hop, and tunneled traffic continuing to flow ignoring the new route. I haven't seen those rules in the past so if you have experience writing them, please show me.
Personally I've only used Linux multiple routing tables to plug this leak.
No, you're wrong again. I just tested this (simulating routes added by a DHCP option 121) and it works exactly as I said.
C is what happens. But it doesn't happen the way you say at all.
Only the traffic heading to the new 121 routes are blocked - why is it blocked? because the routes are on the physical interface, and the firewall rules blocks all off-VPN traffic (except traffic to the VPN endpoint itself)
The tunnel stays up because the tunnel connection is over the physical interface. The VPN endpoint has a physical route from the host to the VPN endpoint which is whitelisted in the firewall. So new physical routes (which option 121 would push) don't impact anything as VPN endpoint route is physical anyway. Also it's impossible for the DHCP server to push a route MORE specific than the endpoint route (which is a /32) that already exists, so it can't be overridden (and it wouldn't matter anyway since it would still be a physical route, which is what is desired here).
Can you stop just talking and actually TRY it? it's all theoretical for you since you're not actually testing it and your theory is completely wrong.
Could you please show me a rule so I can "try it"? I'd love it if this worked.
Let's say:
- the physical interface name is "wlan0"
- the VPN virtual interface name is "tun0"
- the VPN server is 10.1.1.1 on TCP port 8888
- the DHCP server on initial lease sends "0.0.0.0/0 via 10.8.8.8"
- the DHCP server on renew sends the above rule and also "10.0.0.0/8 via 10.9.9.9"
- Before the renew, traffic not routed via 10.8.8.8 is blocked
- After the renew, traffic with a destination IP matching 10.0.0.0/8 not routed via 10.9.9.9 is blocked
What should the firewall rule look like to let the VPN connection stay up both before and after the new rule, while also ensuring that traffic to 10.7.7.7 goes via the VPN and not via 10.9.9.9 or 10.8.8.8?
EDIT: you keep saying "I'm wrong" but I'm just asking how this firewall rule can be structured to do what you say. It occurs to me that perhaps you're saying you can make traffic for 10.7.7.7 get blocked. But in the above, and before, what I'm asking for is how to make traffic for 10.7.7.7 continue to get sent over the VPN after the new rule addition, just like it was before - in other words, no dropped packets.
Your examples are strange as you're using rfc1918 addresses (i.e private range) rather than public ips. So all your examples are very odd.
10.7.7.7 will get dropped. This is correct behaviour based on the routing rules in your example. The VPN connection will stay up as it has a /32 route setup through the wlan0 interface.
What we're concerned about with TunnelVision is preventing LEAKS, this is what constitutes VPN security. If the DHCP server sends a route that forces traffic through the physical interface instead, then the best you can do is BLOCK it. If you really wanted to you could detect and remove the route, but that's a different question and still has nothing to do with the question we're concerned about which is plugging LEAKS.
Also why on earth are you using rfc1918 addresses as an example, it's more meaningful to be using public ip addresses - i.e the DHCP server pushes a 1.1.1.1/32 route. The issue with rfc1918 addresses is it's totally not clear if they will get routed at all as you didn't specify whether they're on-link or not as you didn't provide a subnet mask.
The rules are:
* an ALLOW rule on the vpn endpoint ip + a /32 route for the endpoint
* an ALLOW rule on tun0 traffic
* a BLOCK rule on EVERYTHING else - i.e just a iptables -I OUTPUT -j DROP
Can we please stop the back and forth now, it's really starting to chew up too much time. I don't want to have a conversation about the specific addressing scheme you're using - you should have used public ips to make things much clearer, i don't want a long argument now about subbets and whether certain addreses in your example are on-link or require a routing hop. This is so tiring. :)
10.7.7.7 shouldn't get dropped, because it's supposed to be routed over the VPN. The DHCP server shouldn't be able to cause VPN-bound traffic to be dropped, in my opinion.
As I replied to one of your other comments, I don't think making the attack go from privacy-breach to denial-of-service is "preventing" the attack: you've only partially mitigated it. Full mitigation requires more than the firewall rules you've described.
Said differently, a malicious DHCP server should not be able to denial-of-service traffic within your VPN (or, by selectively pushing routes and then observing the impact on generated traffic, probabalistically determine the IPs with which you're communicating!).
Ok. Well, the attack is so rare that i don't believe putting mitigations against the DoS is worth the effort. The mitigations are not that trivial (though it's arguable that just removing the route is kind of trivial, but still not worth it IMO). Better a DoS than a leak in any case :)
Let's be honest, there're 2 OSs on desktop that matter to VPN providers that provide "all-in-one app" - Windows and macOS. Both have easy to configure from your application.
On linux, well, you have to choose:
- use iptables style rules regardless of the backend
or
- use nftables style rules regardless of the backend.
So it's 3 firewalls that you have to think about.
On mobile, well, on mobile you're mostly at the mercy of the platform owner and generally can't do much. Hence, why connecting phone to "SoftAP/ether_g + VPN" device is better than a direct connection.
"closing specified programs" has to be the silliest thing i've ever heard. By the time you close it, it's probably already leaked thousands of packets. The leak of a SINGLE packet is already too much.
Such an "application killing kill switch" is just marketing fluff, it makes zero sense from a security standpoint.
I can't check your fascinating answers but a friend is, and is beginning to concur, so I'll take theirs and your advice as closer to correct. Thank you. Next question, where have you been for so long and only back now why giving such juicy answers?
Let me explain how a _very_ basic setup works: you set up a firewall rules allowing only connection to VPN on all interfaces except your VPN interface.
If you're running a torrent box, then you can do whatever your OS equivalent of "this process uses this routing table". My seed box was using interfaces that were set up in dom0 and guests didn't even know about a ways to reach outside without a VPN connection being established by the host.
The point is - "such" attacks have no legs against anything beyond "OpenVPN: Getting Started" kind of server.
> A VPN kill switch is a must-have for privacy reasons when using a VPN. If you are actively using the VPN to transfer data and your Internet connection becomes unstable or drops, the entire network connection on your device will be blocked to prevent your local IP address from being exposed to the outside world. The kill switch is on by default on Windows, Android, iOS, macOS and Linux and there is no setting to turn it off.
That and half the information on the internet about VPNs is from VPN providers and is incorrect or not technical enough to describe how they _actually_ work.
We had a sentence in the intro that was supposed to be a hyperlink to the “hey if you know this stuff you should skip to the POC section”. I’ll make sure that gets updated/more obvious.