Tl;dr: By placing load on suspected hidden service hosts (over normal, non-anonymous IP), one can then measure the change in clock skew (as a result of higher CPU / chassis temperature from the higher load) over the anonymous channel to confirm it is the same host (by comparison with clock skew before load).
The result holds over many hops and onion layers (this is the usual "the average of random noise is zero" thing). Very cool.
If the machine has no public services running, would that attack still work? What if it's behind NAT or a hw firewall with ssh exposed only via port knocking?
It needs to be connectible over IP (and TCP, I think?) for it to talk to Tor, which is necessary for running a hidden service. You could imagine an anonymizing network where that is not necessary, I guess, by having the service connect to some other internal node over NAT. (Although then you could just Sygil attack with a bunch of nodes and wait for the target to connect to you.)
This is a good demonstration why security and privacy are hard. Just think about it for a second: Load on the CPU affects the clock and you can measure this clock skew remotely. There are so many possible interactions in a modern computer (and even more you're unaware of) and it looks like every single one of them has to be considered a side channel.
Inducing system load on a Tor hidden service, to generate heat from the CPU, to increase temperature of the quartz crystal driving the system clock, to cause system clock skew, which is remotely detectable via the TCP sequence number generated by rand(), or more directly by TCP timestamps (RFC 1323).
This lets you try to check if a given hidden service is running on a known machine, or if two hidden services are running on the same machine.
Could you skip everything in the middle, since request latency is correlated with system load? You have to load the server in either case, so both are active attacks. I think the problem is that latency is so variable due to Tor itself, it's actually faster to measure server load through clock skew than through request latency.
How would you find a candidate public server to run this attack against? "Many hidden servers are also publicly advertised Tor nodes, in order to mask hidden server traffic with other Tor traffic, so this scenario is plausible." But I think you would run your public Tor relay on a different machine behind the same firewall, since you want the absolute minimum amount of processes running on the machine actually hosting the hidden service.
(My comment on this from yesterday, but it's back on the front page as a new submission)
> Could you skip everything in the middle, since request latency is correlated with system load? You have to load the server in either case, so both are active attacks. I think the problem is that latency is so variable due to Tor itself, it's actually faster to measure server load through clock skew than through request latency.
I don't think it's only the variable latency due to Tor, but also the fact that latency is tightly enough correlated with the load and the software and hardware configuration that it can't be reliably used for fingerprinting. Clock skew, on the other hand, occurs due to various fabrication parameters not being constant; it has a random element that can be reliably used to fingerprint physical machines. To put it another way, two identical machines -- built out of identical components, running perfectly identical software, on entirely identical storage media, with exactly the same bits written in exactly the same location of the hard drive and RAM -- placed behind a NAT would be impossible, or at least much harder to discern based on latency alone, while being comparatively easier to discern by clock skew.
Isn't this trivially defeated by deliberately running the CPU on Tor nodes at 100% all the time, pointlessly burning cycles if there's no traffic to pass? Obviously there's a heat and power consumption consequence in doing that...
Could mitigate that with the heater job running at a nice level of 19 or so. Will keep the CPU under load, but server tasks will run with only a slight degradation.
This means that the attack requires more data, but doesn't make it impossible. Fundamentally, adding randomness to your timestamps is adding noise to a signal. By sampling the signal repeatedly, you can average out the noise.
I'd figure simply quantizing timestamps with larger step sizes would work better. And sure, you can filter those out, but you can also make it take too long to be useful. You could also perform the attack on yourself and adjust accordingly, although this is not robust since it depends on deatils of each attack.
I think that might be worse. By polling your system and waiting for the clock to roll over, an attacker can almost immediately narrow down your clock to an accuracy equal to their polling interval.
Either way, though, more requests will defeat it one way or another. Whether you can make such an attack impractical will come down to how many requests the attacker can make versus how much noise you can tolerate in the timestamps.
White noise would broaden the distribution of time making the attack harder. A more interesting noise distribution could really obfuscate things by introducing multiple arrival time peaks. A temporally varying multipeak distribution would make life really hard.
The clock on a CPU is actually an external crystal, but usually runs at about 100MHz. Inside the CPU, this clock is multiplied by some constant to produce a signal in the GHz range (using a phase-locked or delay-locked loop). Then the clock is distributed across the die with a large number of buffers called a clock tree.
I suspect the measured skew comes from those on-die components that can't easily be placed outside the package.
A modern PC actually has several clock crystals; there's usually at least a high-frequency one that generates the main CPU/chipset clock, and a much lower frequency one (32.768KHz) for the realtime/battery-backed clock. On the motherboard I'm currently using I see two more, one for the Ethernet chip and another on the audio codec. They would all drift somewhat with temperature, but I think for this system at least, using the RTC clock for coarse timing would probably have the least temperature variance as it's far away in a corner of the board and at the bottom, away from any heat-producing components.
I suppose you could also use some sort of GPS-disciplined oscillator too.
Couldn't you just not allow your box to be accessed outside of the anonymous network? It's a neat trick, but how likely is it that you can trace two different services to the same server anyways?
No. Sadly, Tor doesn't work like a VPN, presenting itself as a network interface responsible for a virtual subnet you can restrict your connections to. Instead, Tor connections are plain-old IP connections from random public IPs. Which means that your hidden service needs to accept connections from random public IPs--and, therefore, to be public itself. A hidden service might not have a published IP address, but it must have one that you could, potentially, connect to over the plain internet, in order for the proximate Tor node to it (what would be called an "exit node" if it were a plain site) to be able to talk to it.
In theory, you could hack on the Tor client so that clients and servers did an SIP negotiation prior to connecting on the desired port. You'd then run something on the host which would act like a port-knocking daemon, temporarily allowing new connections on a port only in response to a request from the Tor client, and only to the SIP peer in that message.
(Or, Tor could just present itself like a network interface, giving each N-proxied-peer a virtual IP that changes whenever it regenerates its identity. "Hidden service" connections would be regular IP-to-IP communication. For "public" connections, exit nodes would need to be running SOCKS proxies, and then there could be an anycast IP address that picked a proxied-exit-node at random. Then you'd just set that as your plain-old SOCKS proxy in your browser.)
> Which means that your hidden service needs to accept connections from random public IPs--and, therefore, to be public itself. A hidden service might not have a published IP address, but it must have one that you could, potentially, connect to over the plain internet, in order for the proximate Tor node to it (what would be called an "exit node" if it were a plain site) to be able to talk to it.
This is not at all correct. The IP address of the hidden service is masked in exactly the same way that its clients' IP addresses are. That is, the client and service connect across the Tor network to an client-chosen onion router known as the 'rendezvous point', through which they set up a shared circuit.
Er. To interpret what I said the way you did, is to assume I was saying "Tor does nothing, and clients connect directly to servers", which is kind of... silly, to say the least.
The point I was making was that the proximate node to the hidden service--the last one in the onion-routing chain--connects to its destination by using its public IP to talk to the hidden service's public IP. From the perspective of the hidden service, the node proximate to it in the onion-routing chain is a regular Internet peer, which is impossible to distinguish from any other regular Internet peer.
In the end, what Tor gives you is a proxy (to a proxy, to a proxy.) And, from the server's perspective, there's no difference between a proxy and a regular client. It can't tell, by the IP, that the client it's speaking to is a proxy. And because of that, you cannot, at the server-level, block non-proxied clients from speaking to you. Because you don't know which those are.
It's not a proxy in the way you are describing. The node (onion router) most proximal to the hidden service is always connected to by the hidden service, not the other way around.
Thus it is absolutely fine to firewall off all inbound connections on the host running the hidden service, as it will only be making outbound connections - and even those are to a limited set of IP addresses as defined by the guard nodes it has chosen for entry into the Tor network.
I don't know much about the details, but the tor faq says you can host a hidden service even behind a nat. Wouldn't that protect against being exposed on a public ip?
I'm pretty sure that any competent tor hidden service operator will know not to run anything else on the same hardware. It's basic OPSEC that you want your sensitive Tor-connected machine to only make/accept connections through Tor.
Yes, this would prevent this attack as described as the 'measurer' would be unable to collect any data. Though, the author presents some interesting (albeit untested) alternatives in section 5.4 of the paper.
It's not hidden. The whole idea is that you expose the hidden service's location by doing this, if the location of the service is known there is no point.
So vulnerable implies that there is something to be gained.
- Most services are IO bound, not CPU bound, so pegging the CPU to max might prove a non-trivial task.
- Well designed services don't overload until they're maxing out on either CPU or any other resources, they just serve up to some capacity (say 80%) and then start flat out refusing request with response semantics like "come back later".
- Timestamp sources are typically not from clocks originating inside the CPU.
You won't need to peg the CPU, you only need to get it to warm up a little bit, enough to create a skew that can be detected. Worst case that means that you need to wait longer but it will still work.
The crystal can be on the motherboard, it does not really matter, as long as the total heat inside the case is large enough to create a skew that can be measured the attack will work.
If you warm it a little bit I think the problem becomes your skew becomes lost in the noise of the other people accessing. It's tempting to think that other people accessing is "perfectly uniform noise" but that's not the type of patterns people see in real web services. They get hit in waves most of the time.
If a service gets hit by a wave while you're measuring some suspect server, here's your false positive right there.
Nice paper but somehow I think this tactic would neither work out well in practice, nor work in court as a proof.
All that means is that you need to sample over a longer period.
And it does not have to work 'in court as a proof' to be practically viable attack, and they are well beyond theory:
"Implementing this is non-trivial as QoS must not only be
guaranteed by the host (e.g. CPU resources), but by its network
too. Also, the impact on performance would likely be
substantial, as many connections will spend much of their
time idle. Whereas currently the idle time would be given to
other streams, now the host carrying such a stream cannot
reallocate any resources, thus opening a DoS vulnerability.
However, there may be some suitable compromise, for example
dynamic limits which change sufficiently slowly that
they leak little information.
Even if such a defence were in place, our temperature attacks
would still be effective. While changes in one network
connection will not affect any other connections, clock skew
is altered. This is because the CPU will remain idle during
the slot allocated to a connection without pending data.
Unless steps are taken to defend against our attacks, the
reduced CPU load will lower temperature and hence affect
clock skew. To stabilise temperature, computers could be
modified to use expensive oven controlled crystal oscillators
(OCXO), or always run at maximum CPU load. External
access to timing information could be restricted or jittered,
but unless all incoming connections were blocked, extensive
changes would be required to hide low level information such
as packet emission triggered by timer interrupts.
While the above experiments were on Tor, we stress that
our techniques apply to any system that hides load through
maintaining QoS guarantees. Also, there is no need for the
anonymity service to be the cause of the load."
The result holds over many hops and onion layers (this is the usual "the average of random noise is zero" thing). Very cool.