Hacker News new | past | comments | ask | show | jobs | submit login
Reverse engineering my router's firmware with binwalk (embeddedbits.org)
717 points by sprado on Feb 6, 2020 | hide | past | favorite | 87 comments



This is indeed a cool tool! I've used it before when forensically analyzing a cell phone, and found interesting things. For example, I found that a web browser had cached the unencrypted bytes from an HTTP message. Binwalk identified the gzip header's magic number (1f 8b), and after decompression there were interesting results.

Another cool tool I learned about recently is signsrch. It's more for reverse engineering binaries of software that implements encryption of some type. It'll find signatures in the binaries of these encryption methods, giving you a place to look when, for example, reverse engineering a file format that you suspect is encrypted in some way.

https://www.oreilly.com/library/view/learning-malware-analys...


Cool tool! I wrote something for reverse-engineering code, as a consultant years ago. They had a radio module but the manufacturer had lost the source code.

So the tool was called Golem. It had tables for defining opcode to assembler pattern matching, that could be written for any machine (instead of just the one I was cracking).

It worked iteratively. You ran it over the binary once, it produced arbitrary labels from jump-points. You could annotate that output by changing the labels to something human-readable (e.g. Loop-back, Main, TimerISR etc) and add comments.

The next iteration would read that back in to build a symbol table, rescan the binary and re-output. But this time it would understand that the symbols were always on opcode boundaries, distinguish data table from code entry points (because you marked them) etc. So it would do a better job of staying in sync with the code.

Once I was done with that project (and had re-compilable source for the radio module) I put it away and never thought of it again.


You were on your way to cloning IDA Pro, Ghidra, Binary Ninja, or Hopper Disassembler. To varying degrees, sometimes as a pay-extra option, those tools can produce source code.


Um. I think they post-dated me! But I didn't go anywhere with it.


IDA Pro started as a 16-bit MS-DOS program. It's real old. I'm pretty sure I was using it back in 1992, when it was already a well-developed program.

Ghidra is old too, although only recently public. It couldn't be older than Java, which is from 1996.


Cool. I did mine in 2006. Hey, those have mostly Intel disassemblers. Mine did any machine code you cared to write a dissector for.

Are they iterative? Can you add human clues/cues so they do a better job the next time?


They are not at all mostly Intel disassemblers, though some of them have freeware versions (to suppress competition) or time-limited demo versions that are purposely limited. They are very much designed around humans adding clues: you can declare function parameters, struct types, enumerations, and the meaning of various offsets in code. They are interactive GUI tools, continuously updating automated analysis as the user assists by providing clues to the analysis engine. Ghidra and Binary Ninja can be simultaneously multi-user, storing the database on a server for collaboration.

IDA Pro supports dozens of processor architectures. I count about 70, not including model variations and not including community support. https://www.hex-rays.com/products/ida/processors/

Ghidra supports "X86 16/32/64, ARM/AARCH64, PowerPC 32/64/VLE, MIPS 16/32/64/micro, 68xxx, Java / DEX bytecode, PA-RISC, PIC 12/16/17/18/24, Sparc 32/64, CR16C, Z80, 6502, 8051, MSP430, AVR8, AVR32, and variants of these processors."

Binary Ninja officially supports x86, x64, ARMv7, Thumb2, ARMv8, PowerPC, MIPS, 6502. Community support adds AVR, MSP430, and VMNDH-2k12.

Hopper Disassembler supports "x86{16,32,64}, Dalvik, avr, ARM, java, PowerPC, Sparc, MIPS"


ida handles any arch

it is interactive (so by definition iterative)


I remember that old times too. First IDA was built on Pascal using Turbo Vision (GUI library).

Then IDA went on Windows and today it's multiplatform.


It's a good article but there are much easier ways to use binwalk than presented here.

In the first example he uses the "--signature" and "--term" flags, these are unnecessary. Running binwalk with no flags will produce the same output.

To extract part of the file, he also uses dd with the "skip" and "count" options painfully calculated. You can just use:

binwalk --dd='.*' img.bin

and it will extract everything that matches the pattern - the pattern above will extract all found files.


Just a quick note to be careful extracting what binwalk considers to be 'everything' (such as the pattern above, or a -e for known file types) on larger files. Sometimes there will be a higher amount of matches than you might expect (such as in a .pcap file). You could magically extract gigabytes of data from a 100MB file, which may be unhelpful and takes a long time.


A slightly related question for HNers: Is there any easy tool for a non-cs guy to reverse engineer a binary file containing numbers and text in some specific format?

I have to work with some old structural analysis software. The material and element definitions come in an obscure file format ".PF3CMP". I know it contains text like the material names, and numbers/letters for the material properties.

Ultimately its my goal to be able to write these files from matlab or python, instead of using the horribly clunky user interface. But first I need to know the structure of the file, and I'm not even sure how to begin figuring that out.

[0] is what it looks like when opened in a hex editor

[0] https://imgur.com/a/jvqV3k8


I don't know of any straightforward tools, most people I've seen reverse engineer a format do it with a hex editor and writing custom scripts. It's not directly relevant but the best I've seen is this presentation about reverse engineering the protocol used to communicate within a car: https://www.youtube.com/watch?v=KkgxFplsTnM

It uses some techniques that might be relevant, like monitoring different parts of a file as you make different changes (like accelerating or decelerating). In your case it might be possible to compare between different material definitions for example.


Ok thanks, I'll take a look. It's possible for me to generate these files for each of the various material settings so I can manually 'diff' them, simillar to what you're describing


It sounds like you might eventually be able to write a kaitai struct [0] for the resulting format which would make it fairly easy to use the format in your language of choice.

[0]: https://kaitai.io/


If there are massive differences with minor changes that can be a clue that the data is compressed or encrypted in some manner.

A good test would be if you can name/tag/comment items in the file, you can search for these strings.


I don’t think there’s compression or encryption. I can search and find the hex representation of text and values that I expect to be there. I guess I need to bite the bullet and spend some time tagging the parameters I know, then figuring out the pattern of padding that is in between.


Have you tried the 'file' command on various *nix systems (can download for Windows too)? It mightn't know this format but I think it will tell you if it finds compressed (zipped) data streams in common formats, which will be your first step since many files have some form of compression.

I'll also echo the other comment about reverse engineering the reading functions. Some formats only include certain structures if necessary so even if you have a lot of files you might be missing some example data to complete the picture.


Depending on how weird the format is, it might be more efficient to reverse-engineer the file-reading routines of that program which can work with these files.


Thanks but this sounds... above my level of computer competence


Something like this may help: https://ide.kaitai.io/, but I've found it a bit overwhelming.

It might be easiest to just start writing a utility that parses it, first making guesses and then refining as you generate and test more files like you mentioned in another reply. You already know what the magic bytes are at the start of the file - PF3CMP.


The Linux tool “od” might help you here. The -c flag will print ASCII characters.

You can get it with WSL on Windows, or even just install git and you’ll get git-bash for another easy option.


If its helpful in any way, lots of tool specific file formats like that are basically C structs dumped to a file, then loaded when the file is loaded.


related possibly? what domain is this file from?

https://techdocs.broadcom.com/content/broadcom/techdocs/us/e...


thanks but sadly not, its from a structural analysis program called PERFORM-3D.

I've contacted the developer but they will not release the format of the files to me.


I first found out about binwalk from this YT video on Firmware Reverse Engineering: https://www.youtube.com/watch?v=GIU4yJn2-2A

Quite a good, short intro into the subject as well!


This is amazing! I’ve used binwalk extract for ‘capture the flag’ challenges but I never really thought about the practical applications of it. Wow! Thank you


Funny, I always assumed that there would be no application for binwalk other than for extracting binary firmware images of embedded devices.

Using binwalk for CTF challenges is actually a new insight for me :)


Conversely, it's a convenient tool for obfuscation. You can trigger plausible false positives all over, while also making sure that there's nothing of immediate use with binwalk left.


>Although the firmware was released last year (August 2019) as I write this article, it uses an old Linux kernel version (3.3.8) released in 2012 compiled with a very old GCC version (4.6) also from 2012!

This is what happens whey you pay peanuts for embedded devs and outsource development to the cheapest sweatshop you can find so your products can meet a competitive price point.

Sadly this will not change until there's regulation in place to hold manufacturers accountable for their massively obvious vulnerabilities since nobody cares that they're flooding the market with potential botnet hosts when they're overworked, paid miserably and have a manager constantly breathing down their neck.


It is mostly related to drivers to soc, not about paying devs


Exactly. What I see is that the SoC provider just freezes everything at a given version and supports just that. For example I am currently building Android 9 on a QCOM SoC with a 4.9 Kernel. I don't think it will receive any future update...


So how did OpenWRT manage to build firmware with up to date components for it? The Qualcomm chips inside of it seem fairly modern for such an old kernel.


OpenWRT doesn't guarantee support of all hardware. I have a router flashed to a certain version with a newer kernel, and the Wifi doesn't work because of no driver available.


Coincidentally, this very router's OpenWRT isn't as fast as the manufacturer's firmware because it doesn't have proper drivers for the hardware NAT, so it has to do it in software.


Qualcomm networking chips mostly have open-source drivers, which are mostly upstreamed to the mainline Linux kernel releases. So there's nothing really holding back kernel versions on that hardware, other than the regular maintenance burden of keeping a Linux distro up to date—which few OEMs are interested in doing. Broadcom by contrast is less friendly to open-source, and their closed-source drivers can preclude kernel updates.


Note that openwrt has a big community of contributors and not all devices/features are supported. In contrast the manufacturer firmware is at least feature complete and easy for regular users to set up.


OpenWrt is also free. Both as free software, and free of cost. When you're paying a manufacturer for a product, surely it's not too much to expect them to ship with functional software that also happens to be up-to-date and secure?


You can get that, but not at consumer-grade router prices. I have a separate router that I put behind my stand-alone cable modem. I paid for that separate router about $200.00. And another $100 for the modem. A wifi access point cost me another $100.

So it's about $400.00 for a router that has updated firmware(pfSense). Or you can cheap out and spend only $100.00. This is what you get by doing that.


Support varies, you should purchase devices that include hardware which is supported by the Open Source drivers (even if you have to compromise and it still uses some small blobs that are free to distribute).

You should also purchase a device that includes enough storage space and RAM to support more than the bare minimum; that will help keep things future proof.


Cause openwrt doesnt care if some feature doesnt work but oem should support all features


Most routers run 2.6.32 kernel.


I have the feeling most home routers are designed by the same OEM shop in Shenzhen.


Is there any particular reason for this? Like some feature that was removed in later versions?


The primary reason is likely because the hardware (SoC peripherals) drivers were written for 2.6.x and not forward ported to newer versions of the linux kernel. A lot of hardware drivers were (are) written by the hardware (chip) manufacturers and then abandoned.


when you say a lot, the reality is this is the case with basically every single ARM SoC on the market (and a few x86 ones too!).


From the output I see:

  23296         0x5B00          LZMA compressed data, properties: 0x5D, dictionary size:

                                8388608 bytes, uncompressed size: 97476 bytes

  64968         0xFDC8          XML document, version: "1.0"
So it looks like the size of the bootloader should be 64968 - 23296 = 41672. But he extracts 41162:

  $ dd if=archer-c7.bin of=u-boot.bin.lzma bs=1 skip=23296 count=41162
Curious if anybody knows why 41162; is this a block-size alignment requirement?


I'm wondering how these values are determined too. I'm "following along at home" without any idea what I'm doing (though all the files, bytes, and offsets are matching with the tutorial... Also, if the original author finds this thread: amazing write-up - got me really interested in the topic!).

At the step where they remove the header with

    dd if=uImage of=Image.lzma bs=1 skip=72
It results in a file that if I try and un compress it with `unlzma Image.lzma` it complains with "Compressed data is corrupt"

I don't know where the magic number "72" comes from. Is it likely that could be different on my machine (a mac)?

[edit: I think there's something else wrong - if I use `mkImage` to examine the uImage file I only get:

    mkimage -l uImage
    GP Header: Size 27051956 LoadAddr 78a267ff
Instead of image information]


The 72 bytes is from the difference between the uImage header and the lzma inside, from the post. 0x132b8-0x13270 = 72 (dec).

So you'll need to check what binimage says about your image, the uImage header isn't necessarily fixed in size. Also see the comment above about the --dd switch, though mind the reply to that pointing out you might want to check what it finds before just letting it write a pile of files.


The 41162 bytes comes from the preceding uImage header, you'll see it listed in that big description. I'm not sure what the 510 bytes of padding are, though. Just padding? A checksum?


Maybe bootloader code?


Given the TERMS OF USE under TP-Link's privacy policy [ https://www.tp-link.com/us/about-us/privacy/ ] it seems like they consider it illegal to do any of this. Their terms, along with the "we don't even pretend to care about your privacy rights" attitude have made me question any further purchase of TP-Link products.

Relevant quotes: "By using the Products or Services in any way, you agree to the Terms. " "Also, modifying, translating, adapting, or otherwise creating derivative works and improvements, decompiling, decoding, reverse engineering, disassembling, or otherwise reducing the code used in any software in connection with the Services into a readable form in order to examine the source code or construction of such software and/or to copy or create other products based (in whole or in part) on such software, is prohibited."


How does that jive with the GPL code they are shipping?


glad i flashed latest dd-wrt beta on my archer-c7 v5 :D. though my wan-facing device runs OPNSense.

i actually prefer to run Tomato, but archer c7 is not broadcom :(

can anyone offer advice about dd-wrt vs openwrt (considering trying openwrt).


Latest version of OpenWRT (19) runs noticeably better on this device, with better HW offloading support and based on a nearly mainline, modern Linux kernel and a brand new device-tree for the Atheros SoC.

What reasons do you have to stay on dd-wrt?


> What reasons do you have to stay on dd-wrt?

mostly that i've used it before. can i gui-flash to openwrt from dd-wrt? i've done tftp flashes before but they're pretty fiddly with getting the stupid 30-30-30 or whatever timing right. also i think these routers try to "pull" from a tftp server rather than having you push to one that they bootstrap - i've never been able to get the "pull" variant to work.

would be hell of a lot easier if the router could be booted into something like android's (arm's?) fastboot or flashmode mode so i can just push an image.


Going from dd-wrt to openwrt should be as simple as a firmware flash from the web gui, and an nvram reset. Worst case, you can flash a "revert to stock" image from ddwrt to go back to factory, then flash openwrt as if the device was factory.

Openwrt also has a handy failsafe built into a lot of models. It boots a stripped down http server where you can upload recovery firmware.

Used to swear by dd-wrt, now I prefer openwrt.


Flashing the OpenWRT “factory” (as opposed to sysupgrade) image in the web UI should probably work fine, but don’t quote me on it.

That’s how I flashed from stock to OpenWRT on 3+ Archer units anyway. Make sure not to keep settings.


>i actually prefer to run Tomato, but archer c7 is not broadcom :(

Not being Broadcom is a very good thing.


reading more about it, you're right. i always figured that since broadcom was so widely supported by multiple aftermarket firmwares, that it was the most mod-friendly. guess it was just the most throughly reversed :/


Broadcom is absolutely the worst, because it's the most open-source-unfriendly. It's only had a lot of reverse-engineering attention because it's so ubiquitous (not just for routers, but for laptops too), but it's all no thanks to Broadcom.

Atheros and Intel I believe both have good open-source support.


I’m running OpenWRT and the Archer c7 is on the list of supported devices. I’d say give it a try.


I use Gargoyle on my Archer C7 v2. This thread (https://www.gargoyle-router.com/phpbb/viewtopic.php?t=11896) says that C7 v5 is supported.


Did you notice your wireless signal strength considerably lower when going to dd-wrt?

I put openwrt on my c7 V5 and could barely get any bars.

Flashed back to the stock and was back in business.

Another thing I've read is the third party firmwares don't get hardware access to NAT resulting in speed hits.

Cheers


yes, and i had throughput issues when running in full-width G/N mixed mode compared to my previous Tomato/Asus RT-N16 setup. my phone would also drop out and reconnect intermittently with the c7. but in dedicated AC it seems to be doing well thus far. i cannot say for sure whether this was due to DD-WRT or not as i did not do a thorough comparison to stock.

> Another thing I've read is the third party firmwares don't get hardware access to NAT

i read that too :(


Thank you for the reply, once we loose the last of our 2.4 devices maybe I'll re-try in AC only.


Where can one find the dd-wrt you used for your c7? I have the same device and have been unable to get it to flash anything other than official firmware.


These are the instructions I successfully followed on my C7 V2: https://wiki.dd-wrt.com/wiki/index.php/TP_Link_Archer_C7#Ins...

Here is the exact `factory-to-ddwrt` image I used (this will depend on which version you have): ftp://ftp.dd-wrt.com/betas/2019/10-15-2019-r41328/tplink_archer-c7-v2/


I'm trying repeat steps from article. After next command: dd if=uImage of=Image.lzma bs=1 skip=72 I'm trying unpack lzma file: unlzma Image.lzma And get message: unlzma: Image.lzma: Compressed data is corrupt

Does it mean I downloaded corruption zip file from TP-Link site? How I can extract kernel image? Binwalk says about Image.lzma: 0 0x0 LZMA compressed data, properties: 0x6D, dictionary size: 8388608 bytes, uncompressed size: 3164228 bytes


I don't understand how I can unpack Image.lzma, if "unlzma Image.lzma" doesn't work but "Binwalk -e Image.lzma" work correct?


Did I read the blog wrong, or was the stock firmware too based on a OpenWRT kernel?

That would be pretty hilarious if it was true.


I'm pretty sure a lot of stock firmware is based on OpenWRT or used to be, though I'm pretty sure most of them lag well behind the current version. I haven't paid much attention for a while, but I think a lot were based on Kamikaze which is more than 10 years old now.

For the vendors with access to closed-source drivers and chipset info they can likely support devices not supported on the open source packages.

Edit: Per Wikipedia, "Qualcomm's QCA Software Development Kit (QSDK) which is being used as a development basis by many OEMs is an OpenWrt derivative"

It also notes Ubiquiti's wireless router firmware as being derived from OpenWRT, but I thought I remembered discussion of Ubiquiti being derived from a different open source distribution - unless perhaps the routers and wireless devices don't share a code base.


That's pretty cool. I didn't know that.

Looking into the equivalent firmware[1] for my Archer C7 v2, I didn't find any OpenWRT bits though. I was honestly a little bit disappointed.

I guess the difference between hardware revisions might be more fundamental than I assumed.

    DECIMAL       HEXADECIMAL     DESCRIPTION
    --------------------------------------------------------------------------------------------------------
    0             0x0             TP-Link firmware header, firmware version: 1.-15188.3, image version: "",
                                  product ID: 0x0, product version: -956301310, kernel load address: 0x0,
                                  kernel entry point: 0x80002000, kernel offset: 16384512, kernel length:
                                  512, rootfs offset: 855873, rootfs length: 1048576, bootloader offset:
                                  15204352, bootloader length: 0
    71520         0x11760         Certificate in DER format (x509 v3), header length: 4, sequence length: 64
    98560         0x18100         U-Boot version string, "U-Boot 1.1.4 (Mar  5 2018 - 13:57:29)"
    98736         0x181B0         CRC32 polynomial table, big endian
    131584        0x20200         TP-Link firmware header, firmware version: 0.0.3, image version: "",
                                  product ID: 0x0, product version: -956301310, kernel load address: 0x0,
                                  kernel entry point: 0x80002000, kernel offset: 16252928, kernel length:
                                  512, rootfs offset: 855873, rootfs length: 1048576, bootloader offset:
                                  15204352, bootloader length: 0
    132096        0x20400         LZMA compressed data, properties: 0x5D, dictionary size: 33554432 bytes,
                                  uncompressed size: 2451644 bytes
    1180160       0x120200        Squashfs filesystem, little endian, version 4.0, compression:lzma, size:
                                  9878520 bytes, 789 inodes, blocksize: 131072 bytes, created: 2018-03-05
                                  06:16:10

[1] https://static.tp-link.com/2018/201806/20180611/Archer%20C7(...


The BOM can vary quite a lot between 'revisions', using your product as an example...

https://openwrt.org/toh/tp-link/archer-c7-1750 (Scroll down to the Info Links table and the Wikidevi Info column)

v1 to v2 upgrades the Flash (8MB to 16MB) and uses a slightly different AN+AC wifi chip. v2 and v3 seem pretty similar at a glance. v4 is rated at 12v 2a rather than 2.5a; using a completely different BGN(2.6ghz) chip and also different ethernet chip/switch. v5 is lower power still at 1.5a, but it's less obvious where that change happened due to lack of pictures. A guess based on the simpler antenna list is that it uses less antenna.


EnGenius access points also ship with (an outdated and modified version of) OpenWRT.


Ubiquiti is based on Vyatta.


Which is the predecessor to VyOS: https://www.vyos.io/

It's Open source too for anyone that wants to run it.


Given this line...

image name: "MIPS OpenWrt Linux-3.3.8"

I would say you are true.


Another similar tool to look at is Hachoir.


If you like binwalk, you might want to check out the commercial product, Centrifuge[1], that the developers are working on (I know the CSO).

[1] https://www.refirmlabs.com/centrifuge-platform/


I am really surprised that firmware images are not just .tar.gz files renamed to .bin :/. That's how I would have implemented a distribution of new firmware.


And how do you partition boot-loaders, kernels, and rootfs and such in that tar.gz?

Embedded device will be hard coded to look at a fixed point and start booting from there, there’s no UEFI. How will you ensure boot-loaders get unpacked precisely where they need to be?

And that doesn’t even touch the idea of having a router understand a file system before any firmware code is loaded.

Routers really are quite different from PCs.


I think firmware images are typically not the fixed ROM code the CPU first encounters upon startup, even if they contain U-Boot. Especially if stored in NAND flash they probably aren't.

AR7 platform, for example, the MIPS core runs a small ROM that initializes RAM, then reads some blocks from flash. Not sure how much code you'd need to unpack a tar.gz but completely possible.


> And how do you partition boot-loaders, kernels, and rootfs and such in that tar.gz?

In the past, each of those would be a separate MTD partition with a seperate device file. You just dd them over those files.


True enough, but I think they used to be even more unique and over time they've become more like PCs.

One of these days I'm going to log in to the admin interface and find candy crush installed.


They're "like PCs" in the sense that the instruction set has of the CPUs has caught up and in theory you can attach more complicated peripherals. However, unless your embedded product has MMC flash attached (for many applications it doesn't due to cost + physical size) you're SOL for the following reasons:

1. For M4s your storage is typically some kind of SPI flash which doesn't act like the traditional desktop flash you're dealing with. You have to manually specify the address you're reading/writing & you have to do it on block boundaries (multiple KB). You're generally looking at 8-64MB. 2. For M0 your storage is typically flash built-in with potentially even more restrictions. 3. These devices have very little RAM. Decompression means you have to have a way of enforcing constraints on the amount of space you'll need. Aside from the space needed regularly for decompression you may need to buffer the decompressed content in-memory to align with block boundaries. All of this means development time, increased costs & risk for something you may not be able to pull of.

If your vendor actually internally compresses their image then great but generally they don't for all the same reasons (+ sometimes this is touching ROM code in the chip).


One thing hinted at by the other comment thread, but not brought up: in the embedded world, read-write filesystems as you know them are less common, and usually a failsafe mode is desirable. OpenWRT, for instance, uses a JFFS overlay on top of a squashfs (at least in a recent-ish build for a router I have). So you change out the squashfs (and try to figure out what to do with the overlay filesystem), rather than replacing individual files.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: