This is indeed a cool tool! I've used it before when forensically analyzing a cell phone, and found interesting things. For example, I found that a web browser had cached the unencrypted bytes from an HTTP message. Binwalk identified the gzip header's magic number (1f 8b), and after decompression there were interesting results.
Another cool tool I learned about recently is signsrch. It's more for reverse engineering binaries of software that implements encryption of some type. It'll find signatures in the binaries of these encryption methods, giving you a place to look when, for example, reverse engineering a file format that you suspect is encrypted in some way.
Cool tool!
I wrote something for reverse-engineering code, as a consultant years ago. They had a radio module but the manufacturer had lost the source code.
So the tool was called Golem. It had tables for defining opcode to assembler pattern matching, that could be written for any machine (instead of just the one I was cracking).
It worked iteratively. You ran it over the binary once, it produced arbitrary labels from jump-points. You could annotate that output by changing the labels to something human-readable (e.g. Loop-back, Main, TimerISR etc) and add comments.
The next iteration would read that back in to build a symbol table, rescan the binary and re-output. But this time it would understand that the symbols were always on opcode boundaries, distinguish data table from code entry points (because you marked them) etc. So it would do a better job of staying in sync with the code.
Once I was done with that project (and had re-compilable source for the radio module) I put it away and never thought of it again.
You were on your way to cloning IDA Pro, Ghidra, Binary Ninja, or Hopper Disassembler. To varying degrees, sometimes as a pay-extra option, those tools can produce source code.
They are not at all mostly Intel disassemblers, though some of them have freeware versions (to suppress competition) or time-limited demo versions that are purposely limited. They are very much designed around humans adding clues: you can declare function parameters, struct types, enumerations, and the meaning of various offsets in code. They are interactive GUI tools, continuously updating automated analysis as the user assists by providing clues to the analysis engine. Ghidra and Binary Ninja can be simultaneously multi-user, storing the database on a server for collaboration.
Just a quick note to be careful extracting what binwalk considers to be 'everything' (such as the pattern above, or a -e for known file types) on larger files. Sometimes there will be a higher amount of matches than you might expect (such as in a .pcap file). You could magically extract gigabytes of data from a 100MB file, which may be unhelpful and takes a long time.
A slightly related question for HNers: Is there any easy tool for a non-cs guy to reverse engineer a binary file containing numbers and text in some specific format?
I have to work with some old structural analysis software. The material and element definitions come in an obscure file format ".PF3CMP". I know it contains text like the material names, and numbers/letters for the material properties.
Ultimately its my goal to be able to write these files from matlab or python, instead of using the horribly clunky user interface. But first I need to know the structure of the file, and I'm not even sure how to begin figuring that out.
[0] is what it looks like when opened in a hex editor
I don't know of any straightforward tools, most people I've seen reverse engineer a format do it with a hex editor and writing custom scripts. It's not directly relevant but the best I've seen is this presentation about reverse engineering the protocol used to communicate within a car: https://www.youtube.com/watch?v=KkgxFplsTnM
It uses some techniques that might be relevant, like monitoring different parts of a file as you make different changes (like accelerating or decelerating). In your case it might be possible to compare between different material definitions for example.
Ok thanks, I'll take a look. It's possible for me to generate these files for each of the various material settings so I can manually 'diff' them, simillar to what you're describing
It sounds like you might eventually be able to write a kaitai struct [0] for the resulting format which would make it fairly easy to use the format in your language of choice.
I don’t think there’s compression or encryption. I can search and find the hex representation of text and values that I expect to be there. I guess I need to bite the bullet and spend some time tagging the parameters I know, then figuring out the pattern of padding that is in between.
Have you tried the 'file' command on various *nix systems (can download for Windows too)? It mightn't know this format but I think it will tell you if it finds compressed (zipped) data streams in common formats, which will be your first step since many files have some form of compression.
I'll also echo the other comment about reverse engineering the reading functions. Some formats only include certain structures if necessary so even if you have a lot of files you might be missing some example data to complete the picture.
Depending on how weird the format is, it might be more efficient to reverse-engineer the file-reading routines of that program which can work with these files.
Something like this may help: https://ide.kaitai.io/, but I've found it a bit overwhelming.
It might be easiest to just start writing a utility that parses it, first making guesses and then refining as you generate and test more files like you mentioned in another reply. You already know what the magic bytes are at the start of the file - PF3CMP.
This is amazing! I’ve used binwalk extract for ‘capture the flag’ challenges but I never really thought about the practical applications of it. Wow! Thank you
Conversely, it's a convenient tool for obfuscation. You can trigger plausible false positives all over, while also making sure that there's nothing of immediate use with binwalk left.
>Although the firmware was released last year (August 2019) as I write this article, it uses an old Linux kernel version (3.3.8) released in 2012 compiled with a very old GCC version (4.6) also from 2012!
This is what happens whey you pay peanuts for embedded devs and outsource development to the cheapest sweatshop you can find so your products can meet a competitive price point.
Sadly this will not change until there's regulation in place to hold manufacturers accountable for their massively obvious vulnerabilities since nobody cares that they're flooding the market with potential botnet hosts when they're overworked, paid miserably and have a manager constantly breathing down their neck.
Exactly. What I see is that the SoC provider just freezes everything at a given version and supports just that. For example I am currently building Android 9 on a QCOM SoC with a 4.9 Kernel. I don't think it will receive any future update...
So how did OpenWRT manage to build firmware with up to date components for it? The Qualcomm chips inside of it seem fairly modern for such an old kernel.
OpenWRT doesn't guarantee support of all hardware. I have a router flashed to a certain version with a newer kernel, and the Wifi doesn't work because of no driver available.
Coincidentally, this very router's OpenWRT isn't as fast as the manufacturer's firmware because it doesn't have proper drivers for the hardware NAT, so it has to do it in software.
Qualcomm networking chips mostly have open-source drivers, which are mostly upstreamed to the mainline Linux kernel releases. So there's nothing really holding back kernel versions on that hardware, other than the regular maintenance burden of keeping a Linux distro up to date—which few OEMs are interested in doing. Broadcom by contrast is less friendly to open-source, and their closed-source drivers can preclude kernel updates.
Note that openwrt has a big community of contributors and not all devices/features are supported. In contrast the manufacturer firmware is at least feature complete and easy for regular users to set up.
OpenWrt is also free. Both as free software, and free of cost. When you're paying a manufacturer for a product, surely it's not too much to expect them to ship with functional software that also happens to be up-to-date and secure?
You can get that, but not at consumer-grade router prices. I have a separate router that I put behind my stand-alone cable modem. I paid for that separate router about $200.00. And another $100 for the modem. A wifi access point cost me another $100.
So it's about $400.00 for a router that has updated firmware(pfSense). Or you can cheap out and spend only $100.00. This is what you get by doing that.
Support varies, you should purchase devices that include hardware which is supported by the Open Source drivers (even if you have to compromise and it still uses some small blobs that are free to distribute).
You should also purchase a device that includes enough storage space and RAM to support more than the bare minimum; that will help keep things future proof.
The primary reason is likely because the hardware (SoC peripherals) drivers were written for 2.6.x and not forward ported to newer versions of the linux kernel. A lot of hardware drivers were (are) written by the hardware (chip) manufacturers and then abandoned.
I'm wondering how these values are determined too. I'm "following along at home" without any idea what I'm doing (though all the files, bytes, and offsets are matching with the tutorial... Also, if the original author finds this thread: amazing write-up - got me really interested in the topic!).
At the step where they remove the header with
dd if=uImage of=Image.lzma bs=1 skip=72
It results in a file that if I try and un compress it with `unlzma Image.lzma` it complains with "Compressed data is corrupt"
I don't know where the magic number "72" comes from. Is it likely that could be different on my machine (a mac)?
[edit: I think there's something else wrong - if I use `mkImage` to examine the uImage file I only get:
mkimage -l uImage
GP Header: Size 27051956 LoadAddr 78a267ff
The 72 bytes is from the difference between the uImage header and the lzma inside, from the post. 0x132b8-0x13270 = 72 (dec).
So you'll need to check what binimage says about your image, the uImage header isn't necessarily fixed in size. Also see the comment above about the --dd switch, though mind the reply to that pointing out you might want to check what it finds before just letting it write a pile of files.
The 41162 bytes comes from the preceding uImage header, you'll see it listed in that big description. I'm not sure what the 510 bytes of padding are, though. Just padding? A checksum?
Given the TERMS OF USE under TP-Link's privacy policy [ https://www.tp-link.com/us/about-us/privacy/ ] it seems like they consider it illegal to do any of this. Their terms, along with the "we don't even pretend to care about your privacy rights" attitude have made me question any further purchase of TP-Link products.
Relevant quotes:
"By using the Products or Services in any way, you agree to the Terms. "
"Also, modifying, translating, adapting, or otherwise creating derivative works and improvements, decompiling, decoding, reverse engineering, disassembling, or otherwise reducing the code used in any software in connection with the Services into a readable form in order to examine the source code or construction of such software and/or to copy or create other products based (in whole or in part) on such software, is prohibited."
Latest version of OpenWRT (19) runs noticeably better on this device, with better HW offloading support and based on a nearly mainline, modern Linux kernel and a brand new device-tree for the Atheros SoC.
mostly that i've used it before. can i gui-flash to openwrt from dd-wrt? i've done tftp flashes before but they're pretty fiddly with getting the stupid 30-30-30 or whatever timing right. also i think these routers try to "pull" from a tftp server rather than having you push to one that they bootstrap - i've never been able to get the "pull" variant to work.
would be hell of a lot easier if the router could be booted into something like android's (arm's?) fastboot or flashmode mode so i can just push an image.
Going from dd-wrt to openwrt should be as simple as a firmware flash from the web gui, and an nvram reset. Worst case, you can flash a "revert to stock" image from ddwrt to go back to factory, then flash openwrt as if the device was factory.
Openwrt also has a handy failsafe built into a lot of models. It boots a stripped down http server where you can upload recovery firmware.
reading more about it, you're right. i always figured that since broadcom was so widely supported by multiple aftermarket firmwares, that it was the most mod-friendly. guess it was just the most throughly reversed :/
Broadcom is absolutely the worst, because it's the most open-source-unfriendly. It's only had a lot of reverse-engineering attention because it's so ubiquitous (not just for routers, but for laptops too), but it's all no thanks to Broadcom.
Atheros and Intel I believe both have good open-source support.
yes, and i had throughput issues when running in full-width G/N mixed mode compared to my previous Tomato/Asus RT-N16 setup. my phone would also drop out and reconnect intermittently with the c7. but in dedicated AC it seems to be doing well thus far. i cannot say for sure whether this was due to DD-WRT or not as i did not do a thorough comparison to stock.
> Another thing I've read is the third party firmwares don't get hardware access to NAT
Where can one find the dd-wrt you used for your c7? I have the same device and have been unable to get it to flash anything other than official firmware.
Here is the exact `factory-to-ddwrt` image I used (this will depend on which version you have): ftp://ftp.dd-wrt.com/betas/2019/10-15-2019-r41328/tplink_archer-c7-v2/
I'm trying repeat steps from article.
After next command:
dd if=uImage of=Image.lzma bs=1 skip=72
I'm trying unpack lzma file:
unlzma Image.lzma
And get message:
unlzma: Image.lzma: Compressed data is corrupt
Does it mean I downloaded corruption zip file from TP-Link site? How I can extract kernel image? Binwalk says about Image.lzma:
0 0x0 LZMA compressed data, properties: 0x6D, dictionary size: 8388608 bytes, uncompressed size: 3164228 bytes
I'm pretty sure a lot of stock firmware is based on OpenWRT or used to be, though I'm pretty sure most of them lag well behind the current version. I haven't paid much attention for a while, but I think a lot were based on Kamikaze which is more than 10 years old now.
For the vendors with access to closed-source drivers and chipset info they can likely support devices not supported on the open source packages.
Edit: Per Wikipedia, "Qualcomm's QCA Software Development Kit (QSDK) which is being used as a development basis by many OEMs is an OpenWrt derivative"
It also notes Ubiquiti's wireless router firmware as being derived from OpenWRT, but I thought I remembered discussion of Ubiquiti being derived from a different open source distribution - unless perhaps the routers and wireless devices don't share a code base.
v1 to v2 upgrades the Flash (8MB to 16MB) and uses a slightly different AN+AC wifi chip. v2 and v3 seem pretty similar at a glance. v4 is rated at 12v 2a rather than 2.5a; using a completely different BGN(2.6ghz) chip and also different ethernet chip/switch. v5 is lower power still at 1.5a, but it's less obvious where that change happened due to lack of pictures. A guess based on the simpler antenna list is that it uses less antenna.
I am really surprised that firmware images are not just .tar.gz files renamed to .bin :/. That's how I would have implemented a distribution of new firmware.
And how do you partition boot-loaders, kernels, and rootfs and such in that tar.gz?
Embedded device will be hard coded to look at a fixed point and start booting from there, there’s no UEFI. How will you ensure boot-loaders get unpacked precisely where they need to be?
And that doesn’t even touch the idea of having a router understand a file system before any firmware code is loaded.
I think firmware images are typically not the fixed ROM code the CPU first encounters upon startup, even if they contain U-Boot. Especially if stored in NAND flash they probably aren't.
AR7 platform, for example, the MIPS core runs a small ROM that initializes RAM, then reads some blocks from flash. Not sure how much code you'd need to unpack a tar.gz but completely possible.
They're "like PCs" in the sense that the instruction set has of the CPUs has caught up and in theory you can attach more complicated peripherals. However, unless your embedded product has MMC flash attached (for many applications it doesn't due to cost + physical size) you're SOL for the following reasons:
1. For M4s your storage is typically some kind of SPI flash which doesn't act like the traditional desktop flash you're dealing with. You have to manually specify the address you're reading/writing & you have to do it on block boundaries (multiple KB). You're generally looking at 8-64MB.
2. For M0 your storage is typically flash built-in with potentially even more restrictions.
3. These devices have very little RAM. Decompression means you have to have a way of enforcing constraints on the amount of space you'll need. Aside from the space needed regularly for decompression you may need to buffer the decompressed content in-memory to align with block boundaries. All of this means development time, increased costs & risk for something you may not be able to pull of.
If your vendor actually internally compresses their image then great but generally they don't for all the same reasons (+ sometimes this is touching ROM code in the chip).
One thing hinted at by the other comment thread, but not brought up: in the embedded world, read-write filesystems as you know them are less common, and usually a failsafe mode is desirable. OpenWRT, for instance, uses a JFFS overlay on top of a squashfs (at least in a recent-ish build for a router I have). So you change out the squashfs (and try to figure out what to do with the overlay filesystem), rather than replacing individual files.
Another cool tool I learned about recently is signsrch. It's more for reverse engineering binaries of software that implements encryption of some type. It'll find signatures in the binaries of these encryption methods, giving you a place to look when, for example, reverse engineering a file format that you suspect is encrypted in some way.
https://www.oreilly.com/library/view/learning-malware-analys...