Hacker News new | past | comments | ask | show | jobs | submit login

These "special" strings that Log4j parse must be in the formatting string though, right?

External Strings should normally be logged as parameters, not included in the format String. For example:

    // this is ok
    log.debug("user-agent={}", userAgent);

    // this is bad
    log.debug("user-agent=" + userAgent);
Does this vulnerability still work on the first case?

EDIT: the answer is yes, just tried it myself.




That's stunning. People are screaming about SQL injections and such for decades now, every "programming 101 for complete doofuses" course has a chapter about it in the first ten pages, we have tools upon tools to detect patterns of using untrusted data as control... And yet, one of the most popular logging toolkits in one of the most popular languages has it built in as a feature - literally using untrusted and unfiltered outside data as a pattern - and it takes until 2021 (almost 2022, happy new year!) to realize this is bad??? There were so many problems with untrusted data used as control flow, how it didn't raise any alarms before?


I mean it's so inexplicably bad that it's hard to imagine it being an innocent mistake.

You have to wonder if opening a ticket for such a feature then having someone (or yourself under another account) build it in such an egregious way is a possible vector for deliberately creating such exploits.

If this feature was default enabled, then it's even more suspect. It's just such an esoteric thing.

When you factor in this kind of thing with recent revelations about backdoored NPM packages, you have to wonder if OSS is even tenable. I don't want to go all the sky is falling here, But the default model of assuming that someone's paying attention doesn't seem to cut it. It works well in most cases, but it's that 2% or 5% that's a doozy.


OSS definitely has a reliability and credibility problem on its hands. Codes of Conduct that prioritize belonging over, say, competence and truthfulness is not helping. We need more people like the old Linus that are more concerned with the quality of the artifact than increasing the size of a community


> We need more people like the old Linus that are more concerned with the quality of the artifact than increasing the size of a community

There is a difference between (a) maintaining high code quality and review standards and (b) yelling at people and/or degrading them.

With this in mind, think through the first-order and second-order effects that happen when toxic leadership behavior is allowed, tolerated, or encouraged in an open source project.

Toxic leadership in an open source project, all other things equal, harms the project and the people.

Of course, historically, there are cases where intelligent and committed people exhibit toxic behaviors -- but this is not a pattern of behavior for us to aspire to. Quite the opposite.

I understand that many people (including software developers) struggle in interpersonal interactions. I don't demonize people when they are unaware or lack the tools to treat others with respect. But these behaviors have huge negative effects, so it reasonable and beneficial to have fair, humane codes of conduct to mitigate such negative patterns of behavior.


I think we need fewer people that somehow manage to mangle logic to such a degree that they blame security vulnerabilities on the willingness of some project to state that it's generally preferred to treat each other with some respect.

Linus' insults were always cringeworthy. Sometimes they may have been justified, at other times they were just needlessly hurtful and revealed some sort of weakness on his part. In any case, he always had the power to say no with just that single word.

Log4J doesn't even have a code of conduct in its source tree, though I guess Apache probably has one. You'll get a free Log4j update if you find any instances of it having an impact on the project.


> Codes of Conduct that prioritize belonging over, say, competence and truthfulness is not helping.

What is your reasoning for suggesting that (i) codes of conduct are in tension with (ii) competence and truthfulness?


> Codes of Conduct that prioritize belonging over, say, competence and truthfulness is not helping.

Example, please?


That is a pretty far out security assessment. You have the same exploits in commercial software, just security through obscurity helps because of far less usage.


>That is a pretty far out security assessment

Yeah, I kind of thought that too as I was typing it, but it's so egregiously bad that I couldn't get around that conclusion.

>You have the same exploits in commercial software

Sure, virtually all software is susceptible to exploit. But, if someone were sending unsanitized strings to a database, and without using query parameters, etc., then we would all agree that's inexcusable after all of these years of knowing better.

IMO, this is in that category.


Alexander Dumas once said:

> Une chose qui m'humilie profondément est de voir que le génie humain a des limites, quand la bêtise humaine n'en a pas.

So, never discount human stupidity :-(


Sounds quite sensationalized and the other way round tbh. Nobody 10000 years ago would have imagined how far we’ve come nowadays as a whole, while stupidity would be more likely limited by the circumstances and conditions of the time.


According to a comment here it does https://news.ycombinator.com/item?id=29506397


Confirmed myself here... yes, it does :(


wow it's like SQL injection but even when using user input as parameter it does not get sanitised. Really curious what was motivation for such behaviour.


Here is the jira ticket that introduced the behaviour (jndi lookup) : https://issues.apache.org/jira/browse/LOG4J2-313


That explains the jndi lookup, but not why variables are parsed when they are not part of the format string. That just asks for unexpected issues and exploits.


JNDI lookup in it self is not a problem, problem is that user input is not sanitised and can include templates which can have JNDI lookup in them. I would expect user input with {} template symbols to be escaped and not evaluated.


Maybe, but still this seems as vanity feature added because "It would be really convenient"... this wasn't something what was needed, but something what was added to make life of maybe 0.1% of users little bit easier. My guess is that most of users of Log4J2 don't even know that it is able to do such magic, and would be horrified knowing it. IMHO Log library should log, not do some magic stuff.


I am indeed horrified


The method responsible for variable substitution is here [1].

There are other lookup mechanisms, (didn't check them all) but they only retrieve environment/static values (this one [2] for example retrieves kubernetes attributes). I think the jndi one is the only one that load and execute code.

Edit : I think my understanding of the docs is incorrect so ignore the next paragraph

From the documentation [3], I have the impression that these variables should be evaluated only when loading the pattern layout configuration. But in reality they are also evaluated when formatting the final log message (log.info(...)).

I agree that the input should be sanitized but only if the formatting behavior is a bug and was not intentional.

One possible explanation for the current situation, is that developers assumed that all lookup mechanism will retrieve only static values (env variables for example).

And then another dev introduced the jndi lookup which execute code, but no one noticed the impact on the already existing behavior (evaluation variable when formatting the final msg).

Edit

1: https://github.com/apache/logging-log4j2/blob/9df31f73b62ba2...

2: https://github.com/apache/logging-log4j2/blob/c2b07e37995004...

3: https://logging.apache.org/log4j/2.x/manual/lookups.html


> I agree that the input should be sanitized but only if the formatting behavior is a bug and was not intentional.

Non-pattern arguments should not do any substitution, because otherwise developers have to jump through hoops to output strings verbatim. You don’t want "Invalid identifier: '${<some valid log4j syntax>}'" to be turned into "Invalid identifier: '<the log4j replacement>'" when the actual invalid identifier (e.g. from user input) was the "${…}" syntax. I’m surprised that log4j behaves that way still after two decades.


I was so surprised by the behavior your comment describes that I didn't believe it, but it's true. And it's not a bug, they do this on purpose!

From https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/...:

> Variable replacement works in a recursive way. Thus, if a variable value contains a variable then that variable will also be replaced.

And an example where this caused problems for someone:

https://www.tasktop.com/blog-under-construction/log4j-2-the-...


Recursive replacement is somewhat of a WTF yes but not really in a causitive relationship ro this vulnerability, right? The main cause is the order in which ${} variables are evaluated. They are evaluated after substitution of `{}` in the format string, instead of before. That's the key behavior problem.

A simple rule such as "If you evaluate a placeholder of type `{}` you should stop evaluating further recursively" would maintain most of existing behavior while only removing vulnerable behavior.


> I agree that the input should be sanitized but only if the formatting behavior is a bug and was not intentional.

The behavior is fundamentally wrong as explained by layer8's comment and in the blog post jcheng linked. Apparently it was intentional, and to me that's a million times worse than if it were a bug that could just be fixed.

log4j is unsuitable for use in a way that many many people are suddenly discovering today. The log4j developers need to rethink this, and even if they do, log4j's users should still strongly consider switching to something else. Otherwise they need to audit the whole codebase for other surprising, broken, insecure design decisions not only in its present state but also on each update. I don't see how it's possible to trust the log4j developers' design sensibilities after this.


> Apparently it was intentional, and to me that's a million times worse than if it were a bug that could just be fixed.

I've been on an 'archaeological dig' into the Log4j commit history, and sure enough, there's evidence that the formatting behavior was intentional from the outset.

My write-up is here: https://jedwidz.hashnode.dev/log4j-vulnerability-what-the-fa...


That's way worse than that - that's like SQL server looks into your data and if it sees something that looks like SQL it executes it right there. I wonder how it took until now to find an exploit...


Yup, log injection is something that I've been hearing a lot about recently, but I really only thought about it being able to affect systems that consume the logs. This approach is pretty ingenious.


That's very surprising.


Though of course "debug.log(stuffIGotFromPeer)" is also very common (and as you point out should always be avoided).


If "debug.log(stuffIGotFromPeer)" is dangerous, then the problem is with debug.log and not my code.

Safe logging is not too high of a bar.


The point is that the first argument is the format string. debug.log("{}", stuffIGotFromPeer) should generally be safe (this bug is an example of when it isn't, though)


The fact that the first argument is interpreted as a format string even when you aren't supplying format arguments is a violation of 'principle of least surprise'. The availability of format parameters that can access environment data - let alone remotely loaded code - is a feature most people won't discover unless they go looking for it.


How are varargs implemented in the JVM ABI? Is the called function even aware of how many parameters were actually passed, or does it have to rely on the format string for that?


Turned into an object[] array.

The log4j API implements a few overloads of each log method to help avoid the implicit array allocation happening in common logging cases (no args, one arg, etc).

The call site can tell if args were provided.


The format string should not allow remote execution unless explicitly tuned on either.

Logging should not be the thing you have to sanitize.


TBH, this is a huge antipattern. You have to put some context in your logs, so it must be something like "debug.log("Got from peer: {}", stuffIGotFromPeer) or something like that. You can't complain printf(stuffIGotFromPeer) is dangerous, so here is the same. But in log4j, debug.log("Got from peer: {}", stuffIGotFromPeer) is as unsafe as the other one, too! There's no way to make it safe, as I understand.


Don't see why not. Evaluate the `${}` stuff first, then do the `{}` substitutions.


Fortunately (and for reasons I can't begin to understand), debug and info logging levels seem to be safe, but error is not. Technically better, but somehow... morally worse?


Trying to put my head around, why is this log.debug("user-agent=" + userAgent); bad?


Because the string concatenation requires allocation of a new string, which will need to be garbage collected, regardless of whether or not the log.debug actually needs it.

Using a format and args lets you call the method with only references to existing objects, no additional string needs to be allocated unless the log method actually needs to generate the string to log (and it might even be able to use streaming to output the log and never even allocate the string)

When you’re doing things like putting trace logs with all your parameters in at the top of every method call, the memory and GC pressure of generating unnecessary strings can be significant.


ok, so its only gc overhead, and no security issue with it?


The first argument is code and the rest of the arguments are data, much like an SQL statement and its parameters. You could try to prove that whatever interprets the code in the first argument will never do anything dangerous no matter what it's supplied with, but then someone might add that dangerous feature later, as happened in this case.

To make it always work correctly, don't pass the data values as code. Although apparently[1] Log4j complicates this by mixing code with data even if you separate them, unless you tell it not to by saying "$m{nolookups}" instead of "%m".

[1] https://www.tasktop.com/blog-under-construction/log4j-2-the-...


No. I don’t think anybody generally expects log message parameterization to do anything like escaping or white space normalization or anything to the parameters that wouldn’t also be done to the message string.

If you are using a logger to output a message which you want to be able to parse based on delimiters, say, it would be up to you to escape any parameters you were incorporating into it to ensure they don’t confuse your parser.


Generally, most logging frameworks have two parts, the format string, and the parameters. A good logging library will also avoid calling str()/toString() if the log isn't emitted (for example, if it's a debug log but minimum level is info).

You have something similar when building database queries, generally you should have a base template into which you insert arguments. The library generally should take care of escaping things and also preventing things like SQL injections.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: