Javascript minification seems like a classic case of premature optimization. I'm not a front-end developer so maybe I'm missing something, but wouldn't simple, transparent gzip compression via your webserver — one line in a config file — get you something like ~90% of the benefits of minification, for free?
I'm sure there are some additional gains to be made from knowing the syntactical structure of the language you're minifying, since no generic compression scheme can know that. But that seems really marginal — and, as the article in question shows, opens up potential security vulns.
It depends on why you're minifying, but there are good reasons to do so. I'd bet, as in guess/speculate without evidence, that the #1 reason people do it is for code obfuscation. Front end JavaScript delivers source code to your users and competitors alike, that can be scary when you're working hard on something you think people will like & copy.
Most here are talking about uglify which does very superficial minimization, but Google's closure compiler is in the mix too, and that provides static type checking, nested variable flattening, dead code removal, inlining optimizations, and more. These things are substantial, not marginal, especially the static type checking. It's closer to C++ compilation than it is to gzip.
Gzip can be used (and usually is) either way, and generally still has some benefit even on minified code. But no, gzip can't replace or provide the majority benefit of a minification pass, they're two mostly independent things done for mostly different reasons.
if you use Closure Compiler in Advanced mode it will rename and rescope pretty much everything except native methods and strings (making lots of globals cause it assumes you're compiling all page's code at once), it will eliminate unused code even from third party libs like jquery and do all sorts of stuff that make the beautified code very expensive to understand, follow and reverse and even then you may end up with only half a library that does only one specific task and the rest of the code would be missing.
you have to do a bit of JSDoc annotations in some places to fix some breakages and restructure the code a tiny bit to get it to rename absolutely everything but the results are excellent.
> it will eliminate unused code even from third party libs like jquery and do all sorts of stuff that make the beautified code very expensive to understand
To have that sort of optimisation you'll have to run the Closure compiler with ADVANCED_OPTIMIZATIONS enabled.
Last time I checked that mode of compilation didn't work with JQuery and not with most other libraries. To use it with your own code you'll have to follow certain rules.
I think most people just use uglify or the Closure compiler in SIMPLE_OPTIMIZATIONS, neither of which does the very advanced stuff of ADVANCED_OPTIMIZATIONS.
tried it out and got an error saying it doesnt support ES5 getters/setters unless i change the language_in option. is there a way to do this?
even so, there's nothing it can do to bring back eliminated code, meaningfully extract inlined functions and scope everything appropriately to help a reverser understand context. it would still take an immense amount of effort only to get back something very specific.
the only thing you can probably get out of it is some largely mathematical chunks that cannot be restructured much, eg: rgbToHsv.
Here's part of google homepage js thrown at it (In reality tossed a whole file, but pastebin wont fit it all for free). It's still entirely unreadable.
I think the main reason that it's a best practice for people to minify their JavaScript files is for a smaller download size for the client. With a smaller download size, the page is able to utilize JavaScript at an earlier point and be ready for the user to interact with the page. Minifying and gzipping JavaScript yields a greatly smaller file size than without either. Big win for performance.
While it is true that in practice, the size benefit of both compressing and minifying is optimal, the LZ77 algorithm is nearly the same as a minification algorithm, so running both has only marginal benefit.
Another reason is that minified code + gzipped code is expected for third-party JavaScript widgets (social media buttons, in-app chat, ads, etc) because of legitimate developer concerns around page-size bloat.
Most of those benefits result in code that is faster to execute, the resulting code is smaller than e.g uglifyjs, but the main benefit is that the runtime performance is much better.
Let me present another reason for minifying that this thread doesn't seem to present. In React's codebase, we spread around lots of helpful warnings (those warnings are one of React's less spoken niceties), which goes through a processing step where `if (thisIsDev) {expensiveCheckThenWarn()}` turns into `if (false) {expensiveCheckThenWarn()}` for prod, and then the minifier guarantees to strip it out completely. For example, all the `propTypes` runtime type checking is gone in prod. Those were expensive.
For React-Native for example, I've heard people going from 15fps animation to 60fps when going into prod (!). I always tell them to benchmark their animation on the prod build, because dev build is _not_ representative of the perf you get.
Granted, it doesn't necessarily need to be the minifier's job. But that's the way things are currently done in JS.
If the minifier provides that much more incentive for helpful error messages then I'm all for it. It has helped us avoid a huge amount of duplicate questions when people start with React too. As a matter of fact, I think most libraries should adopt this approach for this feature alone (Webpack makes this process less of a hassle, but we can do better). Otherwise you'd be caught between inserting expensive checks and not inserting them out of perf concern, which would be a true premature optimization.
"Gzipping alone gives you about 70% savings and minification alone cuts script sizes with more than half. Both combined (minifying then gzipping) can make your scripts 85% leaner."
http://www.bookofspeed.com/chapter4.html
yeah, possibly. This is old reasearch [1] with oldish minifiers (e.g. no UglifyJS) and only one script - jQuery from 5 years ago. However I don't expect large discrepancies if the reseasrch is duplicated with new modifiers and larger sample.
What would be more interesting is to see the results of the different minifications. E.g. how much you get from simply removing comments and whitespace? How much if you also rename variables? There might be diminishing returns (and increased security risks) if you go further than those two simple and "safe" improvements
Minifying alone is "kind of" important, and the benefits are hard to measure precisely with the combination of gzip, http/2, and the plethora of other network optimizations. However, the more sophisticated JavaScript compilers, like Google's Closure Compiler (https://developers.google.com/closure/compiler/) also do a lot of other compiler optimizations, like inlining functions, eliminating dead code, etc.
When you are working on a large engineering team with lots of JavaScript volume written by a lot of different people, the dead code elimination alone is worth it; it means you can use one function from a library with millions of lines and your compiled JavaScript will only contain that individual function rather than many megabytes of unused code.
Look at jQuery as an example[0]. gzip only saves you 4-5% over minification alone, but combining both techniques you save a further 20%. If you're not doing both you're leaving a lot on the table.
It's the same story compressing other material. 7-zip has filters to preprocess x86 and IA64 machine code to make it more amenable to LZMA, and such filters can get you a further 10-15%.
1. Gzip is not free. You would have to perform it on every single request. That costs money. Minification is a one time thing.
2. Gzip does not strip documentation.
3. Gzip, as you mentioned, cannot do language specific compression. So if your code is written poorly and can be better written with a closure, Gzip won't do that for you. Proper minification will.
4. Gzipping minified code is cheaper than gzipping non minified code.
5. Minified code uses lesser breadth of variable names and literals. This makes for more efficient Gzip.
As others have mentioned - Gzipping gives around 50% compression, full minification around 70%, and both combined around 80-90%. So if you go whole nine yards, you can save 40% extra on bandwidth. That could equate to thousands of dollars in saving every month for even a moderately famous web app.
You can pre-gzip, or cache the gzipped version; some servers do; in fact storing the gzipped version and occasionally uncompressing it for clients that cant do compression is much more efficient than the reverse as uncompressing is much faster.
You can store both on disk, apache at least, will figure out if the client can handle gzipped content and will send the compressed version when possible.
But doing cache in this way sort of defeats the argument. Why would one cache a gzip copy instead of minified or minified+gipped? And minified alone is more efficient than gzip and can work regardless of client preference.
The argument here is why not simply use gzip and not use minification at all.
Minification - combination of bandwidth savings and smaller code
Remember that Javascript on the client must run once per page load / per script inclusion. The larger the code, the more that must be parsed and executed.
Minification probably had more of an impact before Chrome+V8 when JavaScript engines competed to become extremely optimized. I suppose now most vendors have some sort of in-memory cache for recent scripts feature which would vastly reduce the average script execution due to the need to parse less frequently.
All benefits of sending less data over the wire aside, my understanding was that a significant portion of time during page load is spent actually parsing the js, and that one of the primary benefits of minification is that it reduces this overhead for sufficiently large pages. I notice that nobody has mentioned this, am I wrong?
You're absolutely correct that gzip compression will, in almost all cases, get you nearly all of the speed benefits minification will, and is generally a better method (and easier to do, as well). The reason so much JS, CSS, and HTML gets minified, in my opinion, is entirely a function of Google Pagespeed Insights. Regardless of the actual speed of your site, and regardless of the presence of gzip compression, if you have unminified JS, Google will recommend minifying it. They'll give you an estimate of how much you'll save, but even if it's low, the tool is generally used by average web users (ie, not true developers) who simply take Google's advice on its face.
You can often get a further 10% saving on top of what you get from gzip, and many of us think a 10% saving is not to be scoffed at: not everyone has a high bandwidth connection, and many mobile users are still charged by the megabyte.
When you already have a pipeline like gulp or grub (and if you are developing JavaScript, you should), adding minification is just one more line of code. So, why not minify if it doesn't represent any effort and it could mean better performance? Also, usually, developers don't have access to the production webserver.
Most of the time, people are to busy to analyze all the pros and cons of every action they do.
Said that, if Debian is an open source OS, then yes, they should have access to the unminified code of the applications they include. Or, at least, the sourcemaps.
Anecdotally I've heard of cases where minifying a script (running through the "uglify2" tool) led to extreme and obvious performance issues (like the UI thread freezing for several beats in cases where unminified JS did not).
In that case, the problem doesn't happen because we are minifying the files, but because this particular implementation of the minifier has a bug. I guess...
Yes, it's a classic case of premature optimization.
* As you mentioned, you get most of the benefits from gzip compression.
* You pay a major debugging penalty for having your code minified, since it's much harder to figure out what happened with minified code.
* After the first request your code is going to be cached locally, anyway, so we're only talking about the initial request.
* Using a CDN for your libraries will buy you much more. It's not 100% relevant, but if you're worried about download speed, this is one thing that does make sense.
The main differences are gzip doesn't strip out the documentation, and you don't need to decompress it. So in some cases the minified version would be smaller and a bit quicker. YMMV. I think it's trivial and you're right. Using gzip is easier and a bit more transparent.
10% can be a lot. We have plenty of data that conversions go down as page loads take longer, even by tiny amounts that you wouldn't expect to have an effect.
It may seem really marginal, but any time you can save downloading assets on page load is money. Everyone should be using gzip already, but minification shaves off precious milliseconds on top of that.
Except that your typical web site using minification is leaving hundreds or thousands of milliseconds on the table in other ways. They are minifying out of habit, not because they actually measured anything going faster. If they actually measured, then they'd be looking at the number of assets per page.
Minified js doesn't need decompressing whereas gzip would, but I'll admit it's probably a minuscule amount of CPU even on a smartphone, especially if you cache the decompressed version.
> To me the problem suggests that it is important from a security and
> accountability perspective to 1) include the human-readable source
> code of JavaScript in Debian packages, and 2) to compile the
> human-readable source code into a minified code (if required) during
> package builds, using a JS-minifier that is included in Debian.
> Thoughts?
This is anyway mandatory in Debian.
I wonder... I'm assuming their concern surrounds stuff that is installed on-disk with the package manager. Is there any benefit at all to JS minification when the file:// protocol is in use? Marginal (if at all) improvements to parse time? Anything else? Or is "everyone doing it?"
Apt can be used to install web applications to be hosted from a Debian based server, so just because you're installing a package on a machine doesn't mean the files it installs won't be accessed remotely.
V8 uses a function's length in characters, _including comments_, to make optimization decisions. By removing comments and shortening var identifiers internal to a function, I can imagine a number of functions dropping under that 600-char threshold. Would be great to measure.
Someone in the last thread mentioned that you can actually configure this limit. So it may be slightly more accurate in the ratio when minified but probably not worth it.
that seems like a terrible way to measure things. why not go by the number of some form of nodes after parsing? Parsing has to happen before optimization anyway.
Nope. Parsing happens inline with execution, unless the function has less than 600 characters, where it'll be compiled down instantly. If it has any hotspots after that, they'll be JIT optimized.
It is a terrible way, but you can't complain that it's a lot faster than rewriting V8 to count nodes.
Agreed, this is awful. Presumably it means that adding comments can slow your code down, which I would otherwise have thought a completely laughable suggestion.
The problem with JIT compiled languages is compilation time is as critical as the performance of the compiled code. Using string length happens to be quicker than producing ASTs. So while it is a dirty hack, it generally offers greater performance more often than not.
Given the most used minifiers are probably uglify2 and closure compiler... uglify2 requiring a JS runtime (node/iojs) and closure requiring Java, this would be some overhead for said build systems... personally, nvm/iojs are pretty high on my installed items, but having it in default would likely slow adoption of progressing versions.
Node has pretty much become the standard for dealing with client-side assets in projects.
The build systems are already building and linking massive C++ cathedrals for other packages, I don't think that running uglify2 on nodejs is going to be a resource issue.
More problematic is that nodejs isn't available on all Debian architectures, due to lack of V8.
I'm not saying it is... and TBH any system servicing web traffic with JS these days likely has node involved. I was just saying that it would be elevated to a core component, and might hinder adoption of newer releases, though it could improve it too.
I'm mainly echoing concerns that were in play when Mono had been gaining a little traction, and as an example a few useful utilities were added to Ubuntu's release, bringing in a large runtime... in this case it's not quite as big, but node_modules can get a little big depending on what you're working on (much better with SSD).
It's hard to see how shipping minified JavaScript in the source packages passed muster with Debian in the first place. If it's not the human-readable source that's in the preferred form for modifications, it's not really the source at all.
"If a JavaScript library typically is shipped as minified or compiled code, it must be compiled or minified as part of the RPM build process. Shipping pre-minified or pre-compiled code is unacceptable in Fedora."
They say that but this is not always the case. See for example Roundcube. They just shipped minified JS as shipped by upstream. In Debian, pre-minification JS files are shipped and minified during the build process.
One argument that really drives it home -- GZIP compression without minification is faster and better, and from my own experience takes exactly two lines of nginx config to set up.
Is there really any benefit of using minification besides code obfuscation?
Define "faster." GZIP creates a smaller file size, but the benefit of minification is faster parsing by the Javascript engine -- GZIP does nothing to help with this.
So yeah they optimize for space but they also optimize for execution. For example the closure minifier literally puts everything on one line, without semicolons (where possible) and uses commas which forces the JavaScript engine to execute it as a single statement which can be faster.
That's just a minor point but it can help with speed. Is it enough of an optimization that it's worth it though? I'm not sure; I haven't seen much in the way of benchmarks in this area.
UglifyJS does a lot of dead code elimination. Lots of stuff that only runs on dev environments isn't even included in the production build. Stuff like:
if (process.env.NODE_ENV !== 'production') { invariant(x, y); console.info(z); }
All of the interesting content is in the link provided in the list[1], and otherwise this is just a post about how debian should handle the security issues presented there.
This might be a good candidate for having the link updated.
The moral of the story, it seems, is to beware any tool that modifies or writes your code for you, even something as seemingly harmless as a JS minifier.
Yes, they do – you take a C compiler (written in ASM) that is just good enough to compile TCC, now you prove that the source of TCC and GCC is right, you compile TCC with your simple compiler, then compile GCC with TCC, and you have a compiler which is guaranteed to be free of such backdoors.
Now, you compile your code with this compiler, and if your own build reproducibly has the same results as this verified compiler, you're safe.
You are thinking of another kind of backdoor than the one being discussed. The backdooring technique being discussed relies on a compiler bug hidden in plain sight in the compiler's source code. It has not been sneaked into the binary with a “trusting trust”-like technique. The bug is just an ordinary compiler bug, you do not need to be the one who put it there, it can be a known, published bug and have already been fixed in the compiler's development version (although you can also have found the compiler bug yourself and have omitted to report it for maximum sneakiness).
Re-compiling the compiler does not make the bug go away. Compiling GCC with TCC does not fix bugs in GCC (otherwise people would be doing it more often).
Yes – the only attack my solution does not apply against is if the compiler actually has a backdoor. But debian assumes their compilers do not. Which is the premise for this whole discussion.
No. The discussion is about compiler bugs. The attack “your” solution does not apply against is if the compiler actually has a bug.
Compiler have bugs, and some of these bugs cause them to silently emit the wrong assembly code for the source program passed to them. One of the authors of the original article that inspired bcrypt's blog post reported hundreds of bugs in Clang and GCC, of which about half are “wrong code” bugs: https://github.com/csmith-project/csmith/blob/master/BUGS_RE...
You are right that Debian and other distributions assume that compilers do not have “wrong code” bugs, though. The only problem is that this is not true.
The whole point of the back-dooring js blog post was that you can create "deniable" bugs that are hard if not impossible to see by inspecting the unminified source, and can't prove intention. They already have great reasons to require source, so it's ironic this would be the catalyst to start, no?
Conceptually, this is a reasonable line to draw in the sand. Minified Javascript should be thought of as equivalent to assembly language or Java bytecode[1], and if the policy is to not ship compiled binaries without source available, minified JS should fall into that category.
[1] https://www.destroyallsoftware.com/talks/the-birth-and-death... is funny, but the honest truth is that it's likely the future of web technologies, as Javascript is so deeply entrenched in the fiber of the web standards and the practical implementation of the browsers.
Is the AST structure preserved when minifying? If only identifier names and whitespace changed, then it seems like one could verify that the minified matches the original.
IMHO that's going into optimiser territory, and perhaps the reason why these minifier bugs exist - the minifier writers need to be very sure of the semantics of the language before attempting to rewrite code to have the same behaviour, and JS/ECMAscript semantics are not trivial especially w.r.t the (highly) dynamic type system as the example shows. Even renaming things is not as trivial as it seems because the names can themselves be dynamic, e.g. a.somefield vs a['s' + 'ome' + 'field'].
(I'm not a JS developer so I might've missed a few other things, but minifying JS is definitely a much harder problem than doing it with something like C.)
UglifyJS also rewrites property names if you tell it to. It's conservative so it only replaces the names you explicitly tell it to rename. Recently it gained the ability to let you give a regex, so if you have a convention of always prefixing private rename-safe properties with an underscore, you can now have those renamed.
This thread on V8 inlining was buried at the bottom of the blog post linked in this thread, but it was super valuable and is changing the way I write functions in Node this morning: https://news.ycombinator.com/item?id=10108672
You can trust the minifier as much as you want, but it can still have bugs in it. Debian trusts Firefox (Iceweasel), but that doesn't mean it's not full if security holes!
What would be the point of exploiting the minifier? The purpose is to get the code to the Debian users, how does exploiting the minifier help with that? If you can put an exploit is the program's source, you just want it to be minified as-is so that it'll run on the end user machines.
It has the answer to your question. Namely, you can write bugs that are exploitable that aren't present in the original source, that only appear in the minified output. Which means that a) it is a whole lot harder for someone to find (especially if it's something that is "obviously" correct), and b) it's plausibly deniable.
No, you want the minified source to do something different but predictable so that nobody reading the source could spot the backdoor without knowledge of the bug. Here's an example:
I'm sure there are some additional gains to be made from knowing the syntactical structure of the language you're minifying, since no generic compression scheme can know that. But that seems really marginal — and, as the article in question shows, opens up potential security vulns.