Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem is that obfuscated bash is considered normal. If unreadable bash was not allowed to be committed, it would be much harder to hide stuff like this. But unreadable bash code is not suspicious, because it is kind of expected. That’s the main problem in my opinion.


Lots of autogenerated code appears "obfuscated" - certainly less clear than if a programmer would have written it directly.

But all this relies on one specific thing about the autotools ecosystem - that shipping the generated code is considered normal.

I know of no other build system that does this? It feels weird, like shipping cmake-generated makefiles instead of just generating them yourself, or something like scons or meson being packaged with the tarball instead of requiring an eternal installation.

That's a lot of extra code to review, before you even get to any kind of language differences.


> Lots of autogenerated code appears "obfuscated" - certainly less clear than if a programmer would have written it directly.

That's why you don't commit auto-generated code. You commit the generating code, and review that.

Same reason we don't stick compiled binaries in our repositories. Binary executables are just auto-generated machine code.


But then you have the problem that enabled this backdoor. It's normal to have uncommitted autogenerated unreadable shell code in the tarball. Nobody is going to review test, it was just generated by automake, right? That makes it so easy to sneak in a line that something slightly different. At least with cmake, you have none of this nonsense, people need cmake to build the project, it doesn't try to save users from it by generating a ton of unreadable shell code.


> Nobody is going to review test, it was just generated by automake, right?

Well, there's your problem. If you have unreviewed code, anything can be snuck in. Doesn't really matter too much where in your system the unreviewed code is.

> It's normal to have uncommitted autogenerated unreadable shell code in the tarball.

You need to review everything that goes into the tarball. Either directly, or indirectly by reviewing the sources it gets built from. (And then making sure that your build process is deterministic, and repeated by a few independent actors to confirm they get the same results bit for bit.)


This probably varies widely, because unreadable bash is absolutely not considered normal, nor would pass code review in any of my projects.

On a slightly different note, unless the application is written in python, it grosses me out to think of writing scripts in python. IMHO, if the script is more complex that what bash is good at (my general rule of thumb is do you need a data structure like an array or hash? then don't use bash), then use the same language that the application is written in. It really grosses me out to think of a rails application with scripts written in python. Same with most languages/platforms.


What if your application is written in Rust or C? Would you write your build scripts in these languages, too? I would much prefer a simpler scripting language for this. If you’re already using a scripting language as the main language, you don’t necessarily need to pull in another language just for scripts, of course.


Writing a build script in Rust is fine-ish.

Writing anything in C is a bad idea these days, and requires active justification that only applies in some situations. Essentially, almost no new projects should be done in C.

Re-doing your build system, or writing a build system for a new project, counts as something new, so should probably not be done in C.

In general, I don't think your build (or build system) should necessarily be specified in the same language as most of the rest of your system.

However I can see that if most of your system is written in language X, then you are pretty much guaranteed to have people who are good at X amongst your developers, so there's some natural incentive to use X for the tooling, too.

In any case, I would mostly just advice against coding anything complicated in shell scripts, and to stay away from Make and autotools, too.

There are lots of modern build systems like Shake, Ninja, Bazel, etc that you can pick from. They are all have their pros and cons, just like the different distributed version control systems have their pros and cons; but they are better than autotools and bash and Make, just like almost any distributed version control is better than CVS and SVN etc.


C is probably the best example where I would be fine with scripts in Python (for utility scripts, not build scripts). Though, if it were me I'd use Ruby instead as I like that language a lot better, and it has Rake (a ruby-ish version of Make), but that's a "flavor of ice cream" kind of choice.


Also zig is a good example of not a scripting language which does this job.


build.rs is a thing FYI


or make.go, for some project it makes sense to not add another language for scripting and building tasks. It way easier for every one to have to master multiple language.


I think the main issue is auto tools tries to support so many different shells/versions all with their own foibles, so the resulting cross-compatible code looks obfuscated to a modern user.

Something built on python won't cover quite as wide a range of (obsolete?) hardware.


Python actually covers quite a lot of hardware. Of course, it does that via an autotools nightmare generated configure script.

Of course, you could do the detection logic with some autotools-like shenanigans, but then crunch the data (ie run the logic) on a different computer that can run reasonable software.

The detection should all be very small self-contained short pieces of script, that might be gnarly, but only produce something like a boolean or other small amount of data each and don't interact (and that would be enforced by some means, like containers or whatever).

The logic to tie everything together can be more complicated and can have interactions, but should be written in a sane language in a sane style.


...The main problem is some asshat trying to install a backdoor.

I use bash habitually, and every time I have an inscrutable or non-intuitive command, I pair it with a comment explaining what it does. No exceptions.

I also don't clean up after scripts for debuggability. I will offer an invocation to do the cleanup though after you've ascertained everything worked. Blaming this on bash is like a smith blaming a hammer failing on a carpenter's shoddy haft... Not terribly convincing.

There was a lot of intentionally obfuscatory measures at play here and tons of weaponization of most conscientious developer's adherence to the principle of least astonishment, violations of homoglyphy (using easy to mistake filenames and mixed conventions), degenerative tool invocations (using sed as cat), excessive use/nesting of tools (awk script for the RC4 decryptor), the tr, and, to crown it all, malicious use of test data!!!

As a tester, nothing makes me angrier!

A pox upon them, and may their treachery be returned upon them 7-fold!


> Blaming this on bash is like a smith blaming a hammer failing on a carpenter's shoddy haft... Not terribly convincing.

If your hammer is repurposed shoe, it's fair to blame the tools.


> pair it with a comment

A good practice, but not really a defense against malice, because if the expression is inscrutable enough to really need a comment, then it's also inscrutable enough that many people won't notice that the comment is a lie.


Nothing short of reading the damn code is a defense against malice. I have yet to have any luck in getting people to actually do that.


Let's not get into a all-or-nothing fallacy here, my point is that there is an important security difference between:

1. Languages where obfuscation techniques look a lot like business as usual.

2. Languages where obfuscation techniques look weird compared to business as usual.

The presence or lack of comments in #1 situations won't really help to bridge the gap.


You assert a dichotomy where no split really exists. Qt the end of the day, things like tr or xz or sed invocations are not part of bash as a language. They are seperate tool invocations. Python or any other programming language could have things hidden in them just as easily.

And the other major issue here, is that xz is a basic system unit, often part of a bare bones, ultra basic, no fluff, embedded linux deployment where other higher level languages likely wouldn't be. It makes sense for the tools constituting the build infra to be low dependency.

And yes. In a low dependency state, you have to be familiar with your working units, because by definition, you have fewer of them.

Unironically, if more people weren't cripplingly dependent on luxuries like modern package managers have gotten them accustomed to, this all would have stuck out like a sore thumb, which it still did once people actually looked at the damn thing*.


“Why can’t humans just be better computers!” is a naive exercise in futility. This is not the answer, and completely ignores the fact that you yourself certainly make just as many mistakes as everyone else.


I'm mot asking people to be better computers. Nor do I believe myself to blissfully free of making mistakes. I'm saying, from an information propagation standpoint, it is 100% impossible for you to make a knowing based statement without having first laid eyes on the thing which you are supposed to make knowledge based statements about.

This fundamental limitation on info prop will never disappear. There is nothing harder to do than to legit get somebody to actually read code.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: