Hacker News new | past | comments | ask | show | jobs | submit login

You reverse engineer the application, or you run it in a debugger.

If the app features certificate pinning to block MITM eavesdropping through your own proxy, you either use one of the XPosed Framework libraries that removes it on the fly in a process hook, or you decompile the app, return-void the GetTrustedClient, GetTrustedServer, AcceptedIssuers, etc. functions.

If it features HMAC signing, you decompile the app, find the key, reverse engineer the algorithm that chooses and sorts the paramaters for the HMAC function, and rewrite it outside the app. If the key is generated dynamically you reverse engineer that too, and if it's retrieved from native .so files you're going to have a fun time, but it's still quite doable.

All they can do is pile on layers and layers of abstraction to make it painful. They can't make the private API truly private if it requires something shipped with the client.




The initial idea was to make your life simpler by parsing JSON instead of HTML. Now we are decompiling binaries. Somewhere on the way, we got lost.


Once you do the one-time work of pulling out the key, you can just add something like, "secret_key=foobar" to your requests, and you're back to happily parsing JSON.

If they keep changing it up, I'm sure you could automate the decompiling process. The reality is that this technique is security by obscurity at its core, and is therefore never going to succeed.


Skype is probably one example where it took developers 10+ years to figure out how the app worked.


Did it take that long to do it or did it take that long for someone to care to do it? I mean, it's Skype.


The story of every IT project ever...


In many cases, if all you're after is the data, you don't even need to reverse-engineer much; I'm not familiar with the mobile world but in Windows you can just send input events to the process, essentially simulating a user, and read its output in a similar fashion. You can still treat the app like a black box and regardless of how much obfuscation they put in the code, you still get the output.

(This technique is useful not just for scraping data, but for UI testing of your own apps.)


> All they can do is pile on layers and layers of abstraction to make it painful. They can't make the private API truly private if it requires something shipped with the client.

This is totally true, but the original premise was to do it just with a MITM. I was being generous and assuming most apps do dynamic generation of their keys. I'm probably wrong now that I think about it.


I feel like at a certain point this crosses the line from unintended use of a private API to unethical hacking.

If the data owner went through the trouble of encrypting the traffic between it and it's app they have a certain expectation of private communications that you'd better have a damn good reason for violating.


When an application you have legally installed on your own computer is communicating with the outside world, it seems a fundamental right to inspect the exchanged data to check that it is not leaking personal information. If the data is encrypted or obfuscated, this could make us suspicious (why hiding if there is nothing to hide ?) and gives additional motivation to audit the security.

Once the api is reverse engineered, we might be tempted to improve the usability of the application by adding some features (scraping all data). If this hurts the server (huge resource consumption), this becomes unethical and may become illegal.


And I suppose you personally test the physical security measures of every retail store you shop at?


No, but I do personally test the physical security measures of every car or computer I purchase and bring into my home.


It's certainly unintended use of a mobile API, but it's not hacking; it's reverse engineering. HMAC is used for client integrity verification as a signing algorithm; it's not used for generating hashes or ciphertexts of confidential user data. Furthermore, even if it were, it's operating on data that we are sending to the server in the first place. We aren't actually breaking encryption or cracking hashes for confidential user data, we are choosing to manually sign messages to the server using the same methodology as the application itself. Cryptographically speaking there is a very large difference in utility here. The only actual encryption present is the TLS, but both you and the server ultimately see the data.

Reverse engineering occupies a much more ethically and legally grey area than outright hacking because you are fundamentally taking software in your possession and modifying it. There are strong arguments that people should have the right to do this. If can lead to hacking, and it's useful for security research, but it is not in of itself an attack on the application's security (you could make a case that it is an attack on the application's trust model, however).

Now, if the developers relied on the privacy of the API as a form of implicit authorization (i.e. by forging requests from the client I can retrieve another user's data using an insecure direct object reference on a username paramater), and I proceed to do that - yes, that's hacking. You're accessing confidential data in an unauthorized manner, just as you would be if an insecure direct object reference were present on the website. The developers made a mistake in conflating client validation with user authorization, but you've still passed a boundary there.

It is arguable that this is unethical or at least amoral, but if all you're doing is scraping publicly available data using the public mobile API, it is at least legally defensible until the other party sends you a C&D for using their API in an unauthorized manner (so long as you haven't agreed to a TOS by using the mobile API, which really depends on whether and how prominently they have a browserwrap in place). I think the spirit of your point is that someone probably just shouldn't be using an API if they're not authorized to do so, but it's a very important legal and technical distinction to make here that you aren't hacking by reversing the embedded HMAC process.


Know of any good guides on doing this? I want to do this for an app right now actually.


Not off the top of my head. The knowledge isn't tribal, but it's certainly scattered (few blog posts will give you take you the whole way) and the tools are...spartan.

I recommend you read CTF writeups (there was one hosted on GitHub where a team retrieved the request signing key for Instagram IIRC). Those are usually very tutorial-like, though they tend to take some level of knowledge for granted even if they don't intend to.

The other thing to do is pick up apktool, JD-GUI, dex2jar and maybe IDA Pro, Hopper or JEB and learn them as you you go.


Android tends to be easier to decompile if you want to discover stored keys, etc. I've done it a few times by using Charles on desktop + setting up a proxy to run my iOS connection through my laptop, and then running the app on mobile.


>All they can do is pile on layers and layers of abstraction to make it painful. They can't make the private API truly private if it requires something shipped with the client.

Do note that if you become annoying and/or conspicuous enough, they can use legal force to stop you, and if the case actually goes through the process, you'll almost surely lose. This is true at least in the United States and Europe.

IANAL.


If they're smart enough to collect data in the app to detect when a request to their api is actually coming from their app (versus one coming from curl or from a script), then your reverse engineering problem just went from painful to unprofitable (i.e. not worth your time).


I'm not sure I understand this. The point of the reverse engineering is to simulate (exactly) requests from the app, using your script, thus making it impossible for them to detect. And the data collected in the app has no bearing on your scripted calls to the API outside the app.

I suppose I can envisage a scenario where the app sends a counter as a request id, and my script will then send the next value of the counter as its id, causing the next request from the app to re-use a counter value and thus fail, but the server API and the app have no way of knowing this is due to my script and not, say, network issues, and therefore it should still not affect my reverse engineering abilities...

Maybe, taking this further, the app could have baked in a certain number of unique, opaque 'request tokens' that are valid for one API call only, and when my script has used all of them it will cease to work, and in doing so cause my copy of the app to become useless, but again, not an insurmountable barrier.


Never assume something is impossible. Tokens/counters that are generated at runtime inside the app is a start at countering bots. But there are MUCH more advanced techniques and big businesses that are built upon helping mobile and webapps detect bots and other scripted requests.


Reminds me of the battle between hackers and Niantic to create scanners for Pokemon Go.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: