Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
iOS Apps Using Private APIs (sourcedna.com)
195 points by tptacek on Oct 19, 2015 | hide | past | favorite | 79 comments


Hey all, I'm the founder of SourceDNA and happy to answer any questions about how we found this or about binary code search in general.

We take a different approach to understanding code than the traditional antivirus world. Rather than try to hunt for a needle in a haystack, we've created a system for finding anomalies in code that's already published. For example, you can build a set of signatures for "bad apps" and then repeatedly search for them (AV model) or you can profile what makes an app "good" and then look for clusters of apps that deviate from it (SourceDNA).

Consider an ad SDK like Youmi here. They weren't always scraping this private data from your phone. There are some apps that have this library but that version is a typical, only sorta intrusive, ad network.

But, over time, they began adding in these private API calls and obfuscating them. This change sticks out when you track the history of this code and compare to other libraries. There was more and more usage of dlopen/dlsym with string prep functions beforehand. This is quite different from other libraries, where they stick to more common syscalls.

By looking for anomalies, we can be alerted to new trends, whatever the underlying cause. Then we dig into the code to try to figure out what it means, which is still often the hardest part. Still, being able to test our ideas against this huge collection of indexed apps makes it much easier to figure out what's really going on.


One concern I have is that they'll just move this one step further down the road. For example, I believe that you can get the address of a function like dlopen() by manually loading the bundle and looking up the function name (via something like CFBundleGetFunctionPointerForName()), constructing the name "dlopen()" through some obfuscated method as they do with Objective-C symbols. Then it becomes harder to detect that they're even using dlopen(). Any plans on how to detect that? It seems like an arms race that can't easily be won.


You're exactly right. You can go even further in gathering environmental data from syscalls and using it to construct strings at runtime. At some point, you have to include dynamic analysis in addition to static analysis.

The iRiS paper we mentioned in the blog post describes a really great approach to doing this. They do "forced execution" using a port of Valgrind to iOS. They also do the exact right thing by resolving as many call targets as possible statically, then using dynamic analysis only for the call sites that can't be resolved. This saves on runtime and complexity, though you might notice that even this approach didn't resolve 100% of them.

Ultimately, you're dealing with a variant of the halting problem, where the app uses a specific value only on a full moon, on iOS 4.1, where the username is "sjobs". And that's why computers are still fun.

PDF: http://www.cse.buffalo.edu/~mohaisen/classes/fall2015/cse709...


How did you get access to the binaries? I would think the AppStore requires a session and valid purchase in order to download. Or is this analysis based on binaries submitted by your users?


We both download apps and scan apps developers upload to us.


I'm actually very surprised this hasn't happened years ago. The power of Objective-C's runtime has always made this pretty straightforward.

Apple can defend against unauthorized calls to even runtime-composed method names though. I can think of a few ways.

They could move as much "private" functionality as possible outside of Objective-C objects entirely, which requires that you know the C function name and makes it obvious when you've linked to it. This should probably be done for at least the really big things like obtaining a device ID or listing processes.

Even if they stick with Objective-C, they could have an obfuscation process internal to Apple that generates names for private methods. Their own developers could use something stable and sane to refer to the methods but each minor iOS update could aggressively change the names. If the methods are regularly breaking with each release and they're much harder to find in the first place, that may be a sufficient deterrent to other developers.

They could make it so that the methods are not even callable outside of certain framework binaries, or they could examine the call stack to require certain parent APIs. At least that way, if you want to call a private API, you have to somehow trick a public API into doing it for you.

And, I think Apple does say somewhere that developers shouldn't use leading underscores for their own APIs. They could hack NSSelectorFromString(), etc. to refuse to return selectors that match certain Apple-reserved patterns in all circumstances.


Private API isn't a security mechanism. Apple attempts to detect it, and reject apps that use it, purely to be friendlier to their users. Private API usage makes apps more likely to break with OS updates, and Apple doesn't want apps breaking.

So while Apple is motivated to catch this stuff to an extent, they aren't that motivated, and they may well be perfectly happy with the current level of sophistication.


Ironically new iOS updates tend to break older apps anyways.

toucharcade.com/2015/09/23/bioshock-removal-2k-support-response/


Does anyone know why they aren't access protected? It seems like poor security to just hope that apps don't call these functions.


Some of these APIs are protected, and Apple keeps adding more to this list over time. For example, reading the UDID (hardware serial number) was blocked in iOS 8 and you need an entitlement to call this API. Since the list of entitlements is covered by Apple's code signature, ordinary apps from the App Store can't use them.

http://stackoverflow.com/a/27686125

The reason they don't just do this for every API is due to the sandbox design, which makes this cumbersome.


The use of entitlements to protect APIs is a matter of moving them out of process. This is a trend that has been going on for a long time, especially since the development of XPC in iOS 5.

The trend took big steps in iOS 6, when Apple created a mechanism for remote views, and moved things like sending SMS and email out of process, in addition to the mechanism for determining what process gets what touches (backboardd).

As you mentioned, MobileGestalt was another step in this direction, when Apple blocked access to the MAC address and UDID.

iOS was not originally engineered with this focus of security and sandboxing in mind. For example, CVE-2015-5880 (accessing contents of screen from anywhere prior to iOS 9) existed because the screen framebuffer was needed for QuartzCore to function, and Apple didn't take the time to re-engineer how things work.

The number of things you can still do from the sandbox is mind boggling, and Apple is aware of it, they just don't have the time to re-engineer everything.


Didn't I just answer that? It's not security at all, and not meant to be.


Not really. If there are apps using these disallowed APIs, why would Apple care about breaking them if they removed access? In every case where an app is discovered using the APIs, they remove them from the app store. App compatibility is not the issue.

There's still no clear reason why Apple choose to enforce the security via app reviews, rather than the system call returning 'permission denied' to apps without the right security level.


Compatibility is absolutely why they try to catch these things in review. Apple doesn't want a situation where iOS 10 results in 90% of the top apps breaking, because users will blame Apple and their reputation will suffer.

But it is not a security thing at all.


Either way, why not just deny access? It guarantees that apps won't break in this way, removes some pain from the process of approving apps (they don't need to check for this anymore), and improves privacy / security as a side effect!


They do deny access, and they usually seem to try to do it in a backwards-compatible way. For example, if you try to get the list of installed apps by brute-forcing URL schemas, they return "no more entries" after 50 calls now.

This way your code and most legit code that is not trying thousands of URLs works, but apps are trying to do this fail but don't crash.


Deny access how, exactly? Strictly enforcing privilege separation within the same process is basically impossible without running lesser-privileged code in a VM or something basically equivalent.


Isn't this what entitlements do? I honestly have no idea how they work - perhaps blog material? :)


Entitlements regulate what out-of-process facilities you're allowed to access. (For example, entitlements let you access certain parts of the filesystem, or the network, or access location services.) Private APIs are regular boring in-process classes, methods, and functions which just aren't documented by Apple and aren't part of their public interface. You call them in the same way as you do a public API: by jumping to the function, or invoking objc_msgSend to send a message to an object, all within your process.


I could see an object-capability based scheme working well.

Apple is probably in the unique position to actually implement such a system successfully, controlling the hardware, os, and even language choice.


Sure, if Apple banned Objective-C/C/C++ and assembly code (never mind that most games are written in C++), set up a new system where all developers had to upload Swift IR, removed all unsafe APIs (at a sometimes very high performance cost), and audited all the rest of the gazillions of APIs for memory unsafety in the face of deliberate misuse, then they could implement an in-process object capability system that had only a few thousand vulnerabilities, similar to WebKit.

Or they could continue with the current sandboxing system where security- and privacy-critical functionality is performed out of process, and plug the remaining leaks, of which there aren't that many.


Because enforcing restrictions in code makes the kernel and runtime more complex.


But that's its job!

open() would be a much simpler syscall if it didn't bother to enforce access controls. The code would be a lot easier to maintain too...


It is not the runtime's job. It is the kernel's job, but the whole point of private API is that they're within the same process and don't require a syscall to invoke. Calls which require kernel involvement that you're not supposed to be able to call already have access control mechanisms so you can't call them.


You're both right. It's the kernel's job to ultimately enforce this policy, and calling private APIs should only be a compatibility problem, not a security risk.

I believe the current sandbox design makes it more complex to promote system interfaces to require individual access control. It doesn't provide the fine-grained access, so more work is needed to separate code with the granularity that sandbox policies can control it.

This does happen over time (e.g. UDID in iOS 8).


In what way is the sandbox not fine grained? I thought the whole idea with entitlements was that access to individual facilities could be controlled separately.


Well yes, I meant it's the kernel's job. If the information is already in-process, then there's no access control at all and an app could just read it directly, rather than calling a private API.

I think we're talking at cross-purposes. My original question was why Apple don't restrict this restricted information. It appears we both agree that putting it behind an access control (like a syscall) would prevent this.


Right. It's confusing in a discussion about private API in general, because private API is pretty much by definition calls with no access control which are merely undocumented. Thus saying that Apple should be more clever about prohibiting private API calls is weird. But if it's just one particular call that needs better enforcement, yeah, they should do that. And it will involve promoting it beyond "private API."


> My original question was why Apple don't restrict this restricted information.

Because Apple wants to use this information, but they don't want anybody else to use this information.

If Apple requires security permission to get at the information, then they hose themselves as well.


I'm not sure I understand your argument. Because there is already some complexity in the kernel, we should be fine with adding more complexity to it?


It could use many of the existing access control methods already in the kernel. No extra kernel complexity required.

e.g. the advertising ID could be stored in a file with appropriate permissions. All the API needs to do is open the file and read it out.


I believe now that apps can get the level of info shown in the article it will become a user privacy issue--making it quite closely linked to security.


If so, then Apple will add restrictions to which apps are able to make these calls, by moving them out of process and adding actual security checks. That's how everything that's actually security-critical (calls that manipulate other processes, filesystem access, hardware access, etc.) already works.

If there are any security implications to making a private API call then this implies that Apple's sandboxing system is broken. If so then the solution is to fix the sandbox. Better detection of private API usage might be good for them, but is neither necessary nor sufficient nor particularly useful for security.


I agree that enforcement is the best way to go here. Hopefully they will improve the granularity of sandbox control to make detecting private APIs purely a compatibility issue.


It's kind of an arms race. In the early days of iOS app development, developers would just not call the private APIs during the review period and hide them in an easter egg or enable them from the client checking with the server to see if the setting had been flipped to start using them. Ever since Apple started actually scanning the binaries developers have had to go to greater lengths to obfuscate the usage, but still give users the most powerful features.

Some of the things developed during the arms race have become best practices, though. If you are an iOS game developer and you release a game update without the ability to adjust the game substantially from the server side, you are really in trouble. It takes another long review period from Apple to change anything, so best practice is to control the functionality from the server where if it makes the game too difficult for the users or something it can be adjusted without suffering the Apple review delay.


> I'm actually very surprised this hasn't happened years ago.

It has happened years ago. About a year ago I left a company that was doing this for at least the last two or three years. Their motivations were strictly a technical workaround to get functionality that Apple didn't present to a public API, not to nefariously gather user info, but the technique is the same, and not particularly difficult to figure out.

Probably a good five years ago Apple rejected an app of mine for use of a private API (this was about the time their static analysis was implemented). Another company that sold a buttload of copies of their app had to have used the same API, and their app came out a little bit before mine. Could be Apple just missed it, but I suspect the company in question was doing something similar. (They made millions, I languished in obscurity, but I'm not bitter. <g>)

And those are just the ones I directly know about, or strongly suspect. I'm sure there are/were plenty more.


> I'm actually very surprised this hasn't happened years ago.

It has, and very early on Apple didn't check for their usage so developers started using class-dump results to look for interesting private/undocumented APIs.

Things went south when Apple started checking and rejecting applications which did that: https://www.cocoanetics.com/2009/11/forbidden-fruit-apple-ap...

The youmi stuff is creepy, but generally speaking it's more a question of there being no guarantee about these API's stability or continuity, so the application may break between even minor OS updates, which is undesirable.


What's so magical about Obj-C's runtime that this cannot be detected by Apple? If a third-party can scan for them, why can't Apple? If you're going to have a walled garden, by all means please ensure the walls can't be scaled or dug under.


AFAIK the combination of NSSelectorFromString and NSClassFromString would allow you to build and make calls from obfuscated strings during runtime. Since Objective-C uses message passing this is non trivial to catch. Compared to C were you must explicitly reference symbols that are easy to automatically check for detecting the use of third party APIs is more difficult.

I am not sure if you have to explicitly link against certain frameworks/dylibs though, someone with more knowledge feel free to correct me


The article touches on how to dynamically link into a framework/dylib at runtime with dlopen(3) and dlsym(3) to resolve it's symbols address space.

I've always wondered why Apple doesn't run all apps against a "debug" build of iOS that asserts that the caller of private APIs is itself internal/private to Apple, but instead relies on something akin to grepping the output of strings(1)


This seems like a good approach, but you can still get around it by detecting apple reviews and not doing anything with private APIs during them


It's quite trivial to setup a method that implements "NSSelectorFromString" [1] etc.. and have it read from a Json payload send from a server. It's very hard for Apple to check for that unless they have active monitoring on apps after the review process.

[1] https://developer.apple.com/library/ios/documentation/Genera...


Right. Any runtime behavior can be altered by observed state from outside the phone.

There's even a paper on intentionally inserting security flaws into your code and then exploiting them from your own server to change execution patterns:

https://www.usenix.org/conference/usenixsecurity13/technical...

Ultimately, you need to enforce access control instead of just trying to detect problems a priori. Apple's sandbox is a great start to that, and I expect they'll keep improving it to block apps like these.


The state doesn't even have to be from outside the phone. You could have an internal timer that kicks off 2 weeks after you've submitted to the App Store (to allow for unexpected delays) and switches on the evil behavior.


We don't know why these apps got through in particular. However, I do think we have a unique vantage point by being outside any particular app store.

We scan code on both iOS and Android, identifying libraries and patterns of behavior. OpenSSL, for example, is the same code, regardless of platform, but the versions of it may vary depending on packaging and porting. We found one guy who had compiled OpenSSL for Android and was distributing binaries from github. Not the safest way to get your software.

As a third-party, you can move quickly and correlate info across app stores. But I also think we've developed a pretty unique code analysis engine that neither Apple nor Google has. :-)


I'm sure they can, and will probably start doing so.


dlopen() can be used to call a C function by a string name.

More ambitious programmers can scan executable regions of memory for the instruction stream of the target function and call the address directly without any help from symbolic names at all.


Correct. Even easier, they can store a hash of the symbol name and walk the dyld cache until they find a match. No dlsym required.

The runtime isn't as straightforward as SourceDNA makes it seem.


I didn't say the function would be in a dynamic library, although you're right that this would work if it was.

A statically-linked function would have to be referenced directly in the executable somewhere.


> They could move as much "private" functionality as possible outside of Objective-C objects entirely, which requires that you know the C function name and makes it obvious when you've linked to it.

https://stackoverflow.com/questions/6530701/is-the-function-...


I wonder if it's possible to for the ObjC runtime to randomly obfuscate a processes' "private" selectors on startup, and rewrite ObjC invocations within system frameworks dynamically, to make private API usage harder.

Like what ASLR does for memory addresses, but for selectors.


Seems like a lot of the things they're putting behind private APIs should instead/also be behind a user permission. Getting the list of installed apps, device serial number, and the users email address shouldn't be protected simply with obfuscation.


Yeah, the user's email address is a surprise. It's pretty easy to bypass a simple scan with objc_msgSend.

Conversely, it would be nice if the user could grant access to email and iMessage sandboxes. This would allow us to apply machine learning to personalize services. Ironically, by allowing opt-in, Android is an easier platform for creating private services.


I'm not an iOS developer (well, not really; I don't know what I'm doing) but this seems like it would be a really easy thing for Apple to detect. Does Apple simply not care about access to these to the degree of adding in better checking or is there something fundamental about their platform that makes checking for Apple seriously difficult?


Objectiv-C works by sending a message to an object. If that object responds to the message, it does something. The message is essentially just a string and in fact you can use strings to build selectors.

Normally, these private API selectors would show up in a class dump and Apple will reject you app. But, if you're clever, you can hide them from a class dump. You could encrypt the strings, then decrypt them at runtime and Apple could no longer find your private API usage in a static scan.

At runtime they could detect you calling private APIs, but it would be easy enough to code it so that you don't call any private APIs for a few days after first launch or make them so they're turned on with a server side flag. That way Apple would never notice the private api usage during an App Store review.


Your same argument would stand without modification for reading all data from the keychain, accessing files saved by other applications, using your location, or any other things that require permissions: clearly, then, it is wrong on the face of it, because Apple has simply failed to secure a wide class of things from these processes. It honestly doesn't matter whether there is an "API" for it at the C or Objective-C layers, and that is a red herring: the issue is that there is an IPC layer somewhere (whether a Mach server or a file on disk) that exposes an API by which this data can be obtained by this process.


Exactly right. What we expect to happen next is runtime generation of an XPC identifier to call the underlying service, replicating the behavior of the private API with none of its code.

We're moving ahead with detecting this too as it's an obvious next step.


The question is why don't these APIs require authorization in the first place? Obfuscation is a terrible security practice.


Because as soon as Apple publish something as an API, they can't change the method because third party apps could now be depending on it. As long as the API is private and undocumented, Apple can change the method from release to release as they see fit.


But what about making them private APIs that also require authorisation.


Many do require this, but there are current sandbox limitations that make this harder to change quickly.


I imagine its for performance reasons, every obj-c call uses it and adding extra instructions would massively hurt performance.


> Since we also identify SDKs by their binary signatures, we noticed that these functions were all part of a common codebase, the Youmi advertising SDK from China.

> We believe the developers of these apps aren’t aware of this since the SDK is delivered in binary form, obfuscated, and user info is uploaded to Youmi’s server, not the app’s.

Know your binaries?


Which is impossible with any app of reasonable size that uses third party SDKs (which are often required for business reasons). Apple didn't catch it, how can we expect hundreds of small developers to catch it?


Well, that's what we do for a living -- help thousands of small or large developers find these problems ahead of time.

It's not an easy problem and not something you can do by hand. We've indexed the code in millions of apps and then learn patterns from it. For example, you can discover a new SDK being adopted by apps in one region purely from the co-occurrence of binary signatures.

We're working hard to give developers this kind of insight, helping them detect problems in their apps before they are affected.


Apple spends a few minutes reviewing any given app. The developers have much more time to examine this stuff.


I disagree that developers can be expected to find this kind of thing on their own, say by digging in with a debugger. It's simply not a scalable solution.

I think you need a appstore-wide view of the entire software world and the ability to query it for arbitrary behavior. But I'm biased since that's what we've spent years building. :-)


In the case of the SDK in question, how much due diligence on the developers' part would you say is appropriate? Not necessarily binary analysis, but more along the lines of knowing who you're getting your tools from, reputation, whatever, etc.

Because as far as I, a user is concerned, that developer put that software on my phone.

Maybe I as a user have a similar due diligence burden. One thing I might do is never download an app from the affected developers again. But that doesn't seem like a desired outcome. Nevertheless, I don't see how a user could do anything more fine-grained and nuanced than that.


Really great question. Certainly, there are "too good to be true" offers out there you might want to be concerned about, such as ad networks that pay crazy high CPMs.

I'm biased, but I think developers should be using our service to track the code that they're putting in apps (including their own code) for security, quality, and app review problems. We watch third-party code for them, which is how we found this issue. I think it's unreasonable to expect developers to reverse-engineer every library they include.

I agree users really can't do as much about this. We are considering ways to distribute the list of vulnerable apps/versions to help users find if they have them.


This is not new. Check out "Microsoft AARD code" -- an inverted example of surreptitious analytics, in 1992. TL;DR: the beta version of Windows 3.1 showed a warning if user was using DR-DOS, a competing OS. The payload was encrypted and could be triggered in the production version of Win 3.1 by changing a flag.


Yea, there was a reason why I mentioned DR-DOS when I was discussing the OS/2 2.0 fiasco.


"The apps using Youmi’s SDK have been removed from the App Store and any new apps submitted to the App Store using this SDK will be rejected." I'll be interested to see what happens to Youmi now that they're blocked from iOS. SDK developers: Consider yourselves warned.


I imagine that new versions with the private API stuff removed will be allowed. I don't believe that this is some sort of lifetime ban on Youmi.


How did SourceDNA have access to millions of iOS app binaries? Can anyone just download all the apps in the App Store?


I think you can download .ipa file using iTunes. Alternatively, using iFunbox to backup apps.


AFAIK you still have to run those on a jailbroken device, letting the kernel decrypt the __TEXT segment for you, then dumping the decrypted binary from memory to disk. Though every binary is encrypted the same, no matter the device, so supposedly there may be some key that you can extract to enable decryption off-device.


Blog about 2 groups discovering the bad apps and reporting it to Apple, but then: "Apple has issued the following statement. “We’ve identified a group of apps that..." Stay classy Apple - great attribution.


Is it just me or are they totally ripping off the research done by the Iris team and making it sound like they came up with these vulnerabilities themselves? I know they give the researchers a cursory mention, but it's buried at the bottom of the article.


No, that's not at all what they did. They discovered the issue independently and, as you can see if you read the whole article, in a manner very different from that of the other team.

(Fair warning: I am not just a disinterested observer of SourceDNA).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: