If I was in a position to be concerned, I'd keep Wireshark open and go to the elevator and see if the music was in sync/starting at the same time. If it wasn't, that would send up some red flags for me.
Again, if I were in a position to be concerned - I'd move hotels with Wireshark actively monitoring and verify the network traffic dropped when I left the WiFi range, and also what kind of UDP/network traffic was at the next hotel.
But if I were in a position to be extremely concerned, I'd probably just throw everything away to begin with, including the clothes I was wearing, buy a laptop/ new clothes, and then, after escaping out the back of one of the stores and getting picked up by a random taxi service and driven a good distance, go to a hotel without an elevator and check the Wireshark traffic.
Unless it was some sort of very sophisticated monitoring, I would hope one of these strategies would provide some answers.
okay this is way over my head - but this would have to be steganography hiding in the audio file that could only be run by someone like OP, detecting; downloading; etc the udp data, right?
Today (but perhaps slightly less in 2016, not sure) you could easily imagine a microcontroller (or FPGA) with a microphone that bugs you, but encodes that audio (using steganography) onto a canned audio file of elevator music, and then sends the result over the network "in the open".
To a casual observer snooping the relevant network, it would probably (as here) look as elevator music, but to the intended recipient who can decode the steganography, it would be a covert listening device.
Even better than a canned audio file would be machine generated music. Otherwise you could detect that the “same” song is being transmitted with slightly different bits.
Or you could have an extremely long audio file so the repeat situation doesn’t occur.
There's a very simple way around this. Grab any encoding that uses a dictionary, like, I think zip does. Sent tiny zip files with an excerpt of a .wav file or something that needs to be compressed.
The decompressed data is always the same, but the data in the dictionary used is where you store your sneaky bits.
Sure, that's still mildly suspicious. But way less than the actual music data changing all the time.
You could also hide the bits in eg timing of package transmission or omiting an expected package every once in a while, it'll just look like udp dropped a package.
You can use a channel with lots of noise, because you can use error correcting codes to to restore the intended message.
(To elaborate with an example: sometimes a package might already drop randomly, or timings might be slightly delayed anyway.)
If you have a legacy music system and encode it over the network, the data will vary— time bases wont line up, there will be noise, etc, even if it repeats.
Yeah, that's the problem. If the song is known either because its a previously published song or because it repeats then the bits should be the same every time the song is played. If you are adding extra data with steganography then you don't want it easy to detect because someone could make a steganography detector by testing if the same song's data seems to vary for no reason.
I guess you could also build an auto-auto-tune or auto-remix solution so the songs always have justifiable variation without needing fully generated music.
I know next to nothing about steganography, beyond the general concept, so please excuse my ignorance if I'm way off base, but couldn't you design the encoding system such that you could encode the same amount of data within the time frame of each loop, with only some portion of the new data containing real data (audio recording)?
This would work if the elevator music was a never-repeating stream. With most elevator music, it's a few minutes of a "song" playing on repeat 24/7, so if you recorded a few repetitions of the "song", you'd probably see also repeated packets on the network.
If the music was repeating but the stream was different all the time, then steganography could be the reason :)
Basically, you'd use steganography to hide that you send a message. And that message would be encrypted. You can use almost any standard encryption scheme, as long as you remove headers etc. Any ciphertext of a crypto-system worth its salt will be indistinguishable from random noise without the key.
More than a red herring, the multicast could be valuable because it obscures who's actually listening. Everyone on the network is receiving these packets so there's no way to single anyone out. Seems like a great move for spy software, 100% plausible deniability. (but I really doubt that's happening here)
My memory is a bit hazy on this one - but I used to run the engineering team for a company that did multicast based IPTV for hotels about ~2003 or so and I'm pretty sure the set top boxes used IGMP to control what video streams were sent to them - all devices on the fibre backbone got all the streams but each device in the rooms (connected by copper) could only handle a single stream.
So multicast doesn't necessarily mean that every device gets every packet... I think.
Also probably not worth using IGMP for audio.... :-)
The best red herring would be to publish a blog post a couple years before you do this, where the author concludes that the traffic is harmless elevator music.
I’ve done this to make music remotely on the CLI from my phone SSHd into a VPS, forwarding pulseaudio/JACK and listening to the stream in a browser in the background. fun times
Nice! I assume if it was over the WAN you probably had to unicast it?
I've experimented a tiny bit with the builtin pulseaudio RTP sink/source, but I've not used it extensively. I'm not sure if cutting out ffmpeg would be beneficial.
Actually the multicast audio is a pretty good system. There're multiple elevators. Broadcasting makes it simple to sync up the music on them. Using a Wifi speaker has much lower cost than adding a wired speaker to a moving elevator. It’s also simple and low cost to add extra WiFi speakers in other areas of the hotel, creating an ad hoc PA system.
I worked on a WiFi multicast video streaming solution and while it theoretically works as well and is as easy as you describe in practice it can be a complete nightmare.
Full disclosure this was a few years ago so things may have improved. I also can’t remember all of the specifics but there was a lot of low-level driver work, firmware tweaks, specific configurations of just about every WiFi param you can think of, etc.
Sonic used to offer mbone connectivity (including BBC channels) back when they were an internet service provider and not a web provider. It was pretty nifty but never very user friendly to set up.
When I worked at a core i2 .edu site, we had all the "hd" mbone tv streams. It was pretty wild considering HD was a "new" thing! Thanks for reminding me of that wonderful time!
I wanna say a special welcome to everyone that's, uh, climbed into the
Internet tonight and, uh, has got into the M-bone. And I hope it doesn't
all collapse.
There's nothing in the article that suggests that Wi-Fi is being used on the elevator. In fact I'd say most likely the elevator is using a wired Ethernet connection. It's just that the broadcast domain for the L2 network includes both the wired elevators and guest Wi-Fi.
Unless I misread, there's nothing that suggests Wi-Fi is involved at all. It rather sounds to me like the author was listening at the Ethernet port that the TV was plugged in (making this less of an issue, as some people in this thread thought). But this is not entirely clear to me...
> there's nothing that suggests Wi-Fi is involved at all. It rather sounds to me like the author was listening at the Ethernet port that the TV was plugged in
Author here. Everything on the article was received on the guest Wi-Fi network on my laptop without plugging into anything.
> The thing is multicast is not anycast, you will not receive multicast traffic unless you specifically ask to join a group.
Most likely a dumb cheap switch that doesn't snoop on IGMP (or a less dumb switch that wasn't configured to snoop on IGMP) was upstream of both the OP and the device in the elevator. So the frames were being flooded since the switch doesn't know any better, and then normally ignored by OP's network stack since they didn't join that multicast group, until something like tcpdump/wireshark enables promiscuous mode.
Remember that networks introduce latency. It might be tiny but the human ear can detect speakers being _slightly_ off.
For example you wouldn't want a wifi speaker in an elevator using a repeater at the top of the shaft trying to match up to a hardwired speaker in a ground floor vestibule.
You can use NTP to get the devices' clocks synced up to much better than necessary tolerance, and play back accordingly.
And then you "just" have the same problems that you have with purely electrically connected, analogue speakers (which are effectively 100% in sync in terms of receiving the signal): Sound is relatively slow, and so the audio from a speaker that is far away will reach you later than the nearby speaker.
You can mitigate that by adding a precise delay to the far away speaker... but of course that does not work if you're standing on the other side. Nevertheless, as said, that problem is regardless of whether your speaker is network-connected or not.
Kind of. The bigger problem you will have if you try this is that the audio is not clocked by the system clock, and the audio clock is almost always free-running (and even if it were derived from the system clock, NTP et al don't generally discipline the clock itself, just the OS's presentation of it). So in the case of a long running playback (or continuous, as in this case), you will drift out of sync over time, and it doesn't take that long to become noticeable. And at some point you'll either start dropping out due to either buffer underflow or buffer overflow. So you do still need to take care about this.
So to work well you do need to resync the audio to the local audio clock using a sample rate converter, or build some custom hardware that lets you sync the playback audio clocks somehow. Or if you want to be sloppy about it, keep close track and stuff or drop individual samples as you drift.
Sonos has a remarkably good implementation of all of this.
For URL-based streams they buffer and NTP to sync. For live streams (e.g. gaming) they p2p multicast and tweak the wifi params in real-time to minimize drops.
The speakers create their own wifi and use MST network heuristics to latency-min route over that versus native wifi or ethernet if you've plugged it in. Sound drops when the wifi spectrum blinks (rarely), but I have never encountered the speakers being out of sync or noticing an echo effect.
And the speakers can use your phone's mic to scan the soundscape of a room to acoustically balance the sound when you set them up. I particularly like how consistent the sound volume is room-to-room even with very different speaker setups.
IIRC they've patented their specific mechanism. So ya, it's solved, but it may be expensive to license.
(Not affiliated with Sonos, I just have a bunch of them and like them a lot.)
Yeah, Sonos is very much the Apple of this space. A solid, user-friendly implementation of several pre-existing concepts into a cohesive product - no small task. I don't think the technologically important parts of this are patentable though, there's both prior art and the obviousness standard to worry about. But very much like Apple's 'rounded corners' case, they've gone after (IMO) obvious UI functionality for such a system to extract money from their competitors.
If you are just interested in the synchronized Audio-over-Ethernet part, AES67 is the industry standard, and a pretty complete open-source implementation can be found at https://github.com/bondagit/aes67-linux-daemon , though AES67 is itself a composition of existing standards, fundamentally it is mostly composed of SDP for sessions description, RTP for media, and PTP for clock sync, so you can build that out of a variety of implementations too.
The patent actually covers a mechanism for electing a master controller for synching and storing configuration parameters. The actual process of synching audio is not covered. Not that difficult to work around the patent. But definitely easy to trip over the patent if you're not careful.
True, it was definitely simplified. But yeah, in cases where you really care, there's a bunch of options to do it completely/sufficiently in sync. (A true asynchronous sample rate converter, as it would have to be here, might be a bit expensive, but simple interpolation, or even stuffing/dropping, might be sufficient for this particular use case.)
Just re-sync at the start of each song. Sound propagating through air introduces ~ 1ms of latency per foot. So if tracks drift out of sync by a few milliseconds, it's no big deal.
That is one solution, and in some scenarios it might not even be noticeable, but it's basically conceding the problem and accepting a guaranteed audio dropout at the end of every 'song', since for this to work you need some dead time to ensure all buffers are drained and start the new stream.
The simplest model is a source that generates a continuous audio stream, and a sink that plays it back; adding the idea of songs complicates the model, and in some use cases might be totally inappropriate. For elevator music, sure it likely doesn't matter, and maybe you can hide it in a crossfade or something with enough metadata, but this is probably part of a system where you put audio into one device connected to the network, that might include live stuff like PA announcements, and it comes out a bunch of other ones, not a dedicated elevator music system.
I don't know much about audio encoding, but do the speakers not have to buffer the incoming packets? Large enough buffer size would introduce drift between speakers even if everything is fine network-wise.
Just make sure they have a large enough buffer, and buffer enough, so that all speakers can play the same frames at the same time (or with the exact delay you want for each specific speaker).
You only care about delay between the speakers, not about what latency any speaker has relative to the source.
A fairly typical and simple approach is to set an intentional, fixed delay, say 500ms, to absorb network latency / inconsistency. The sender sends a target playback timestamp ~500ms in the future with each block of audio. Then the actual delay at the playback side can expand or contract as necessary to take up network delay. The lower you make this delay, the more care you need to take on the network side to guarantee timely delivery.
NTP is accurate enough for this, but I think most of the modern protocols in the wild e.g. AES67, AirPlay2 are using PTP. It is both more accurate and in some ways simpler for this use case.
I'm not sure if the habit is because of the show or not, but I wanted to note for anyone who might be reading that "Detectorists" is a really charming British show that I highly recommend.
" There was obviously data in this packet and file should know when it sees MPEG Audio data, so I decided to write another Python script to save the packet data with offsets. This way it would save the file test1 skipping 1 byte from the packet, test2 skipping 2 bytes and so on. Here’s the code I used and the result."
I was once in a shopping center parking lot where what sounded like a private conversation was being broadcast at an al fresco dining establishment on one side.
It makes you think.
We can't assume that any given hotel network is well-configured, but most enterprise networking equipment verifies the source of multicast traffic against the multicast routing tables. This means that if you simply send packets with a source address matching the real multicast source, the network devices will ignore and not forward them. This reverse-path check is a standardized part of PIM, the most common protocol that network devices use to communicate multicast groups between each other. It's also enabled by default on Cisco devices for local groups and I would assume the same of other vendors.
That said, it's considered a best practice (although not really all that common) to use ipsec or another method to provide cryptographic authentication of multicast packets. The protocol discussed here may do so.
This reminded me of an evil prank I did on some friends of mine when in university - CD burners had just become a thing, and at the local concert venue where we volunteered, a handful of burned CDs soon appeared at the mixing console with various music the engineers liked to listen to while getting ready for a gig.
Anyway, I ripped the discs, added a nice 50Hz hum under the music and burned new copies which I then left by the CD player.
Yup. Cue frustrated sound engineers trying to debug the ground loop which only manifested itself when the CD was playing.
I never dared admit to the prank, but rather swapped the hum CDs for the originals before someone got a chance to investigate this thoroughly...
No one is going to use UDP to bug a room. They're going to encode the audio over prerecorded noise and broadcast on 500/600mhz, i.e pretend it's a stage microphone that a musician left on in their bag.
The question is: why? Why distribute the music that way, why not just attach it to a radio directly, or some RPi that will pull the stream, or just have it prerecorded.
Despite its simplicity, seems to be overengineered.
Because distributing music over multicast works very well, and it's rather simple in nature: You get audio frames as UDP packets, you play them. Because it's multicast, they reach where they should.
> why not just attach it to a radio directly
Because you lose control of what's being played, and there are probably not many "elevator music" radio stations.
> or some RPi that will pull the stream
Because then every location that plays the music has to pull an individual stream, quickly saturating the bandwidth for no reason, instead of just subscribing to the multicast stream.
I also would not say "some RPi" is less engineering, and less maintenance, than the current solution that likely uses an off the shelf multicast audio system.
> or just have it prerecorded
Because you lose control of what's being played without a significant replacement step, when you could just play a multicast stream instead.
> Despite its simplicity, seems to be overengineered.
I'm honestly not convinced I read a less "overengineered" solution so far...
> Because you lose control of what's being played, and there are probably not many "elevator music" radio stations.
Radio is still by far the simplest and most fail-safe solution. But not playing an existing radio station. Rather buying a radio transmitter, like the ones used in churches - strong enough to emit FM throughout the hotel, and yet not requiring a radio license. Then the speakers would just be plain speakers with a FM receiver attached - no more wifi/wired infrastructure involved, routers, ability of the speakers to connect to the wifi, ...
I think this might have been a popular solution once, but my guess is that it came with its own set of problems. Hotels often being pretty big (bigger than most churches, which also tend to be relatively open structures), usage of the license free bands from other devices, audible interference (which you don't care about in ham radio or portables as long as you can still understand what's being said, but for music in a hotel even low interference is bad as soon as it's audible)... though admittedly, I have no experience with radio transmitters for that particular use case.
> I'm honestly not convinced I read a less "overengineered" solution so far...
Prerecorded music on an MP3 player sounds like the one simpler solution, though it wouldn't put the elevators in sync with each other or let you control it remotely.
Yes, but if you want to do even as much as not play the music in the middle of the night, now your MP3 player needs a clock. That clock may drift, or it may drop out completely when the power goes out, unless you synchronize that clock over the network. But if you have network on your MP3 player, you may as well let it subscribe to the multicast stream instead of reading the stream from internal storage. If you do not have network, and you want to change the schedule of when your music plays, have fun to send out some maintenance crew to every single one of those devices. And don't forget any...
In the U.S. at least, most jurisdictions will require the music to be shut off in an emergency -- namely, if the fire alarm goes off. Audio sources commonly are required to have a relay/GPIO input that triggers the shutoff, a network command is not good enough (network switches typically not being life safety rated).
In this hotel scenario, you have one audio source that can be located in a location that can be conveniently tied into the fire alarm relay panel. Stop the audio there and you've stopped it everywhere.
This is pretty much the simplest feasible way to do distributed audio over IP, and it's commonly implemented by commercial distributed audio amplifiers.
The advantage of IP distributed audio is that it functions over the existing IP network, so it avoids the need to wire a high-voltage audio system (which will still require multiple amplifiers in large buildings) or dedicated signal-level wiring to distributed amplifiers. It also tends to be more reliable as the IP network is more robust to issues like crosstalk and poor connections that can turn into frustrating troubleshooting on analog systems.
"Some RPi that pulls the stream" is exactly what this system is except it uses multicast for significantly reduced bandwidth usage and the receivers are presumably commercially-supported distributed audio equipment. No hotel wants to be doing patch management for embedded Linux devices.
I don't think multicast significantly reduces bandwidth usage. Traffic needs to go everywhere anyways. It will probably reduce CPU usage since switches (even dumb ones) can just broadcast the packet at line speed, since it's all in the hardware. There's no need for whatever server is distributing the stream to maintain multiple sockets and copy buffers multiple times.
With unicast, the node originating the stream would have n times the traffic, where n is the number of devices pulling the stream. Further from the originating node in the network topology, each edge would transport as many times the traffic as there are listeners whose traffic travels along that edge. That seems very significant.
With broadcast, all the nodes and edges transport the stream only once, but you can only distribute the stream in the same (L3) network, and within that network you transport the stream even through L2 nodes and edges (i.e. network switches, cables, Wi-Fi channels) where there are no listeners at all. You could in theory repeat those broadcast packets into other networks, but you need to set that up more explicitly, and then every node and edge in that network gets the traffic, too.
With multicast, devices and effectively segments can subscribe to the stream, and all nodes and edges transport the stream at most once, and they can do so across network boundaries while retaining the same property, and using a standardized mechanism that is designed to make the traffic only go where it needs to go as much as the topology allows.
anyfoo is correct but it might help to explain the implementation. Multicast IP works in cooperation with IGMP, a protocol that manages "groups" at the IP layer. In a unicast model, a device that wants a stream asks the server for the stream, and the server starts sending the stream to that device. The server maintains n outgoing connections sending n copies of the stream, one for each subscriber.
With multicast, a device that wants to receive the stream uses IGMP to join the group. The IGMP communication is not with the server, but actually with the router serving the subnetwork. Additionally, larger commercial switches usually implement "IGMP snooping" in which the switch "listens in" on IGMP sessions between its clients and the upstream router. The server maintains only one connection and sends only one copy of the stream, to a multicast group address---there are IP ranges reserved for this purpose. The router, and switches which implement IGMP snooping (or layer 3 switches, there are some variations), forward traffic to the multicast group address on any interface on which a client has used IGMP to join the group. Switches without IGMP support will just forward it on all interfaces. The result is that, at each point in the network, only one copy of the stream is handled. This has significant performance benefits for both the server and the network devices.
As the name implies, multicast is much like broadcast except that devices can opt in or out of receiving the broadcast, and network devices can use knowledge of those IGMP sessions to avoid sending multicast traffic on interfaces where no one cares to receive it. That said, multicast traffic going to network segments where it's not used is not especially harmful besides wasting some capacity on that network segment.
Unfortunately multicast is not workable over the internet for reasons which are difficult to overcome, or IPTV and other synchronous media streaming services would be far less costly to run. On an institutional network, though, multicast can be used to great effect. It's a fairly old method as well. In my first IT job we used to install the operating system on workstations by PXE booting them to an imaging tool and then multicasting the disk image across the network... this way we could do hundreds of machines at once at just about disk saturation rate. This kind of thing isn't readily achievable without the use of multicast or broadcast traffic. The tool was Norton Ghost, which has apparently supported this mode of operation since 1998!
As long as it's segregated that way. Someone was able to gain access to their Dr's office's network by accessing the wifi-enabled fish tank thermometer.
I forget if the details of how exactly they accessed it, but it was an example of an Internet of Things device making a security hole in a network.
That's more of an annoyance than a safety issue though. This is definitely one of those YAGNI things where it's not worth worrying about until it becomes a recurring issue, which is highly unlikely.
Personally, even the remotest possibility of someone broadcasting "evacuate your rooms, there is a fire/active shooter in the building" seems worth spending a few hours protecting against.
>>Do you think hotels aren't protected, to some extent
How are they protected against that, exactly? You can literally walk up to any fire emergency button on any wall on any floor, press a button and evacuate the entire hotel, why bother with this UDP streaming nonsense?
The threat to the perpetrator -- of 90 days prison time and a permanent criminal record of being a mischief-maker -- prevents people from pulling the alarm.
Sure, and to circle all the way back to the original point several posts up - why is this a deterrent to someone pulling a fire alarm but not for someone sending a fake UDP broadcast? The penalty will be exactly the same.
Harder to track down the person. Unless the hotel is logging every packet on its network and paying to archive the TBs of encrypted video streaming data that goes through every day. And it's a purely local network, so not like the NSA can help out.
Edit:
"Unauthorized" computer access is a serious federal crime under the CFAA, and that you did it as a joke is not a legal defense. Famous examples:
Hotels are reasonably protected to such an attack. There are almost always cameras in the elevator, and the electronics are typically, to some extent, tamper-resistant.
It's multicast, not broadcast (which means the networks could still be separated, among other things), and it's not too surprising that the network for the entertainment system carries this (by today's standard) low bandwidth stream.
Yeah, with a better/better configured switch it could not reach the individual client unless it subscribed to the multicast stream, but again, who cares about that low bandwidth stream... you should not trust any hotel network anyway, those audio packets won't do extra harm.
It's also highly likely that they do not care about more things if they do not care about this.
Sure it's an elevator speaker and probably you can't do much (probably! 'simple' devices are packing ever more hardware).
But something is sending those packets. What's that? Can it be hacked? What else can that device/server see? Are there other devices sharing the network with customers?
Which rules? Works for me on Nextdns.
I have the “NextDNS Ads & Trackers”, and “Adguard DNS” blocklists enabled, and I think also enables every non-beta feature under Security.
Joke's on you, it's a bug listening to you in your room while using steganography to merely appear to be elevator music!